Longform Markup Language

Language Intro

Longform is an easy to read markup and templating language that outputs to HTML and XML. A Longform document can be parsed to a complete document in the output format or as fragments to be used by an application as a source of markup when generating a document, or manipulating DOM in a browser environment.

Example Longform markup

header::
  hgroup::
    h1:: Longform Markup Language
    p::
      A markup and templating language for producing <b class=keyword>HTML</b>
      and <b class=keyword>XML</b> document fragments.

Result

<header>
  <hgroup>
    <h1>Longform Markup Language</h1>
    <p>
      A markup and templating language for producing <b class="keyword">HTML</b>
      and <b class="keyword">XML</b> document fragments.
    </p>
  </hgroup>
</header>

Unlike Markdown, which is an all-round popular markup language but limited in what HTML elements and markup can be represented without extensions or falling back to HTML, the Longform syntax adds no constraints on the possible markup of the output format. As a result Longform plays well with custom HTML elements and elements and attributes which might be added to HTML in the future. And in being able to express elements such as <nav> or <head> Longform can be used effectively for static content markup for regions of the website beyond the main content.

Longform also supports directives that alter how a block of Longform or plain HTML / XML is processed. This specification will formalize directives which may be used in a browser environment using a minimal Longform parser.

For example, in a Longform template the directives @allow-elements and @allow-attributes can be used to allow user defined markup to be used in a template while filtering out any exempt markup. User input markup in the Longform syntax will have filtering rules applied by the Longform parser, while embedded HTML or XML content will require that an external serializer such as DOMPurify is configured to hook into the parser and apply the rules to the user defined content.

Longform template using sanitization directives

@template
##card
section.card.note-#{position}::
  header::
    h3:: #{title}
    
  @allow-attributes:: lang dir
  @allow-elements:: details[open name] summary h4 p strong em a[href target]
  div.card-content::
    ##{content}

Finally, the work in progress Longform parser is small and fast. Currently at 2.6kb when minified and gzipped and supporting many of the intended features for the browser environment. The parser is likely to grow but is unlikely to reach near the size of Commonmark at 47.6kb or Marked at 12kb. The parser is also fast, as it can complete its job by building up the resulting HTML fragment strings line by line instead of with a two step process of constructing and abstract syntax tree and then forming the valid output markup.

Fragments

Longform's primary design goal is to output fragments of HTML into a form that can be merged with other sources and rendered into a complete document by another program, and to do so in such a way that a client side runtime can extract the fragments from rendered DOM and re-use those fragments in a client application without duplicating data in another form during transport.

To achieve this, depending on the fragment kind, a Longform parser will embed HTML ids or data attributes in a fragment and export additional meta information that can be embedded into the rendered document to assist the client when extracting the fragments from the rendered DOM.

Note that not all fragment kinds can be transported once in the rendered document without duplication. Text fragments allow the client to present messages where HTML markup is not supported. For example, if a client was to dynamically render an aria-label value, any included HTML markup would be presented or read to the user as text. By not having HTML markup wrapping the content there is no predicable and straightforward method for annotating the text fragment and for the client can extract it from the DOM.

However, allowing text only fragments makes Longform a suitable format for transporting all translatable text and markup in the language specified by the client. If all textual content is placed in a Longform document for a given language, and merged into non-textual content to make a complete application state, only the Longform document will require translation to support other languages; even if the page has lots of textual content embedded in interactive content rendered by the client.

Fragment identification

Most Longform fragments have a Longform identifier that appears before the their markup block. When the Longform document is parsed and exported to the parent program, each fragment, if exported, will be referencable by its fragment identifier. Within the Longform document fragments can also embed other fragments using the syntax #[other-identifier]. This behaviour is described in detail in the Embedding fragments section. Some fragments identifiers are intended to be unique to the document and will not embed twice, or be exported if used within the document.

The Root Fragment

The Root Fragment is an optional fragment that has no Longform identifier. There can be only one Root Fragment in a Longform document and its primary use is for outputting complete documents.

After a Root Fragment is found in the process of parsing a Longform document, all other fragments which do not have Longform identifiers are ignored when rendering the output markup. The Root Fragment often would have the @doctype or @xml directive prefixing it to add a HTML doctype or XML declaration prefixing the output markup.

Example Root Fragment using the `@doctype` directive

@doctype:: html
html::
  head::
    title:: Example Root Fragment
  body::
    h1:: Example Root Fragment

Result

<!doctype html>
<html>
  <head>
    <title>Example Root Fragment</title>
  </head>
  <body>
    <h1>Example Root Fragment</h1>
  </body>
</html>

Embedded fragment identifiers

Embedded fragment identifiers begin with a single # hash character followed by a single ASCII letter, optionally followed by many letters, numbers, hyphens and underscores within the ASCII character range.

The fragment identifier cannot have any whitespace or other characters preceding it on its line. Only whitespace can follow the fragment identifier, and apart from line breaks, should be ignored by the Longform parser.

The Longform markup block following the identifier must be declared on a line following the fragment identifier and can only be separated by lines of white space or comments. The fragment's markup block must have a single outermost element declared in the Longform syntax and not as HTML. No whitespace can precede the declaration of the outer element.

Example element with an embedded identifier

#embedded-id
section::
  p:: A fragment that can be referenced by its <a href=#embedded-id>identifier</a>

Result

<section id="embedded-id">
  <p>
    A fragment that can be referenced by its <a href="#embedded-id">identifier</a>
  </p>
</section>

Bare fragments

Bare fragment identifiers begin with two # hash characters followed by a single ASCII letter, optionally followed by many letters, numbers, hyphens and underscores within the ASCII character range.

##alert-something-went-wrong
dialog[open].error::
  p::
    strong:: Something went wrong!
  form[method=dialog]::
    button:: Close

Result

<dialog data-lf="alert-something-went-wrong" open class="error">
  <strong>Something went wrong!</strong>
  <form method="dialog">
    <button>Close</button>
  </form>
</dialog>

Range fragments

Range fragment identifiers begin with a single # hash character followed by a single ASCII letter, optionally followed by many letters, numbers, hyphens and underscores within the ASCII character range. The range fragment type is identified by an [ opening square brace separated from the identifier by whitespace.

The fragment identifier cannot have any whitespace or other characters preceding it on its line. Only whitespace can follow the fragment identifier and the opening brace, and apart from line breaks, should be ignored by the Longform parser.

#head-details [
  title:: The range fragment
  meta::
    [name=description]
    [content=Demonstrating the range fragment]
]

Result

<title data-lf="head-details">A range of fragments</title>
<meta data-lf="head-details" name="description" content="Demonstrating the range fragment" />

Text fragments

Text fragments do not include any elements. Programs using Longform output can use text fragments in locations where elements are not allowed such as HTML attributes. Text fragments are particularly useful where Longform is being used as a master document for all translated copy for a webpage.

#aria-label "
  Create a recipe
"

Result

Create a recipe

Whitespace

Whitespace is meaningful in Longform. Any markup indented two spaces out from an element will be outputted as a child of that element.

Some exceptions for this are when native markup of the output language and text are being processed. Or when in a preformatted block.

Declaring elements

Element tags

A sole element tag can be outputted using the element name followed by two colons ::.

div::

Element attributes

Element attributes are declared after the tag and are wrapped in square brackets [].

div[data-foo=bar][aria-describedby=#baz]::

Alternatively attributes can follow directly after the element tag with 1 level of indentation.

div::
  [data-foo=bar]
  [aria-describedby=#baz]

If an element is declared multiple times the content is concatenated into a single value. This behaviour does not apply to the element's id if it is defined using the attribute syntax. Classes will be concatenated with a space separating them.

meta::
  [name=description]
  [content=Lorem ipsum dolor sit amet, consectetur adipiscing elit.]
  [content=Quisque a sem et nisl mollis porttitor et sit amet neque.]
  [content=Maecenas suscipit nulla ac suscipit imperdiet. Quisque]
  [content=odio nisi, semper non dui quis, feugiat faucibus ipsum.]

Element output ids

The elements output markup id can be declared on the line before the tag at the same indentation level with the hash # symbol pre-fixing the id.

This form of giving an element an id also gives it a meaningful Longform identifier.

#element-id
div::

Alternatively the id can follow the tag name, before the closing semicolons. Again with a hash prefixing it. Unlike the form where the

div#element-id::

And finally the id can be declared using the attribute syntax.

div[id=element-id]::
<!-- or -->
div::
  [id=element-id]

If an element has an id declared for it twice only the first declaration is used.

Element classes

Classes can be defined following the element's tag declaration with a period . prefixing each tag.

div#element-id.class-1.class-2.class-3::

Alternatively classes can be defined using the attribute syntax on the lines following the tag definition.

div#element-id::
  [class=class-1]
  [class=class-2 class-3]

Element text and native markup content.

Elements can have text and native markup following the tag declaration with a space between the text and the double colons of the element definition.

div:: Text content with <em>some</em> native markup.

Alternatively, the text and native markup can follow the tag declaration with one extra indentation level.

div::
  Text content with <em>some</em> native markup.

Chained elements

Elements can be chained to create many elements in one line.

menu::
  li::a[href=/section1]::b:: Section 1
  li::a[href=/section2]::b:: Section 2
  li::a[href=/section3]::b:: Section 3

Preformatted blocks

Longform does not assign special meaning to any HTML tags so to retain formatting content can be wrapped in curly braces to create a preformatted block.

Escaped preformatted block

When an element is followed by a single curly brace, its content is HTML escaped and its whitespace is preserved at the children's indent level.

pre::
  code:: {
    div::
      <p>
        This content will preserve its formatting including
        the parent <code>div::</code>
      </p>
  }

Result

<pre><code>
  div::
    <p>
      This content will preserve its formatting including
      the parent <code>div::</code>
    </p>
</code></pre>

Un-escaped preformatted text

When an element is followed by a two curly braces, its formatting and content is kept intact.

script:: {{
  console.log('Hello, World!');
}}

style:: {{
  div {
    color: red;
  }
}}

Result

<script>
  console.log('Hello, World!');
</script><style>
  div {
    color: red;
  }
</style>

Comments

Comments can be written outside of a fragment using -- two hyphens. Within a fragment comments in the output markup language's syntax can be used but they will be written to the output alongside all other text in the native output markup.

Embedding fragments

Fragments can be embedded into other fragments as part of the Longform parsing allowing a full document to be created from many fragments. Fragments are embedded using the syntax #[fragment-id].

@root
@doctype:: html
html[lang=en]::
  body:: 
    header::
      menu::
        li::a[href=#section1]:: Section 1
        li::a[href=#section2]:: Section 2
    body::
      #[section1]
      #[section2]

#section1
section::
  ...

#section2
section::
  ...

Templating

Longform templates only allow key-value pairs inputed with the values being strings. The template is intended to be passed over to the client where the template might be rendered using client side state. It is assumed that the client has its own means to perform conditional statements and iterate, and is optimized to do so, so those functionalities are left to the scripting language to keep the templating logic lean.

@template
div::
  h3:: Recipe step #{position}

Templated markup

Arbitary Longform and HTML strings can be inserted in a template using the double hash variable expansion form. See content sanitization for rules on sanitizing untrusted input using this form.

@template
@allow-elements:: strong em
div::
  ##{markup}

Content sanitization

A Longform block can have sanitization rules applied to its content using the @allow-elements, @allow-attributes, @allow-data-attributes, @allow-all directives. These directives are designed to play well with the SanitizerConfig Web API, but for now a sanitizer library must be shipped with the parser to apply the directive rules on any content.

Sanitizer rules apply when expanding variables in a templating context using the double hash expansion syntax ##{var} and in situations where @patchable is used.

Sanitizer defaults

Longform cannot sanitize raw HTML without having a sanitizer library parsing HTML input and it cannot differentiated between text and HTML markup. If the double hash template expansion is used in a situation where no sanitizer is configured the variable expansion SHOULD be ignored by parser implementations. Parsers MAY support an option to bypass this default document level behaviour.

A document specifying no rules allowing elements, attributes or data attributes also SHOULD ignore all input using the double hash variable expansion. Again, a parser might allow equivalent options to the sanitizer directives to bypass this default behaviour.

Element specific sanitizer rules

A element can have rules applied allowing arbitary markup to be added to the document. Either @allow-elements or @allow-all must be used to allow any elements to persist.

@template
#embedding-content
section::
  header::
    h2:: #{header}
  @allow-elements:: strong em a[href target]
  @allow-attributes: class
  div::
    ##{markup}

Sanitization rules are inherited by child elements. So the following Longform would produce the same results.

@template
@allow-elements:: strong em a[href target]
@allow-attributes: class
#embedding-content
section::
  header::
    h2:: #{header}
  div::
    ##{markup}

Global settings

Settings can be configured document wide using the @global directive from the top level of the document.

@global::
  @allow-elements:: strong em a[href target]
  @allow-attributes: class
  
@template
#template1
section::
  ##{markup}

@template
#template2
section::
  ##{markup}

Directives

@url

Sets the URL of the Longform document. A HTTP Get request to the @url using the Accept header text/longform should produce the same document unless it has since been modified.

@url:: https://example.com/blog/article-1

@patchable

Asserts to the client that the document can be patched using a HTTP Patch request and the Content-Type header text/longform. The @patchable directive should be ignored unless the @url directive is used.

@url:: http://example.com/blog/longform-1
@patchable

@doctype

Inserts a doctype declaration at the beginning of a fragment.

@doctype:: html
html[lang=en]::
  head::
    ...
  body::
    ...

Result

<!doctype html>
<html lang="en">
  <head>...</head>
  <body>...</body>
</html>

@xml

Inserts an XML declaration at the beginning of a fragment.

@xml:: version="1.0" encoding="UTF-8"
html::
  [xmlns=http://www.w3.org/HTML/1998/html4]
  [xmlns:xdc=http://www.xml.com/books]
  body::
    ...

Result

<?xml version="1.0" encoding="UTF-8"?>
<html
  xmlns="http://www.w3.org/HTML/1998/html4"
  xmlns:xdc="http://www.xml.com/books"
>
  <body>
    ...
  </body>
</html>

@template

Marks a fragment as being a client side template. When a fragment is a template the Longform parser skips formatting the fragment and outputs it separately to the processed fragments to be passed through to the client. Client side logic can then pass the template into a special Longform template parser and have the HTML output returned.

@template
#button-text "
  Add new #{entityName}
"

@editable

Marks the children of an element as editable in a patch request. The element cannot be within a template and must have a Longform id set on it.

@editable
#edit-me
div::
  This content is editable.

@global

Applies directive rules to an entire Longform document. Directives applied before or within a fragment will typically override globally set rules.

@url:: https://example.com/pages/article-1
@patchable
@global::
  @allowed-elements:: h4 p strong em b i small hr br

@allow-elements

In a template this directive instructs the parser what elements can be rendered when applying non-escaped variable expansion within its scope. If used in a patchable document, client side editors should limit what elements can be edited in the editable element. The directive's rules should also be used to sanitize or reject input when merging edits from a HTTP Patch request into the document on the server.

Attributes can be allowed on specific elements by listing them in square brackets directly after the element in the directive's arguments.

If this or the @allow-all directives are not used all elements should be filtered or rejected when editing or applying variable expansion.

This directive can be applied in a @global directive block to set the default rules for a document. The directive applied closest to a template variable expansion or editable element takes precedence.

@editable
@allow-elements:: a[href target] p strong em
div::
  Edit me!

@allow-attributes

In a template this directive instructs the parser what attributes can be rendered when applying non-escaped variable expansion within its scope. If used in a patchable document client side editors can limit what attributes can be added to the markup. The directive's rules should be used to sanitize or reject input when merging edits from a HTTP Patch request into the document on the server.

If this or the @allow-all directives are not used all attributes should be filtered or rejected when editing or applying variable expansion.

@editable
@allow-attributes:: id name class
@allow-elements:: a[href target] p strong em form label[for] input button[submit]
div::
  Edit me...

@allow-data-attributes

In a template this directive instructs the parser to allow rendering of data-attributes when applying non-escaped variable expansion within its scope. If used in a patchable document client side editors can allow data-attributes to be added to the markup.

If this or the @allow-all directives are not used all data-attributes should be filtered or rejected when editing or applying variable expansion.

@editable
@allow-data-attributes
@allow-attributes:: id name class
@allow-elements:: a[href target] p strong em form label[for] input button[submit]
div::
  Edit me...

@allow-all

This directive instructs the parser to allow all elements, attributes and data-attributes when performing non-escaped variable expansion or when editing a patchable document.

@editable
@allow-all
body::
  Edit me...

Language Intro

Example Longform markup

Longform template using sanitization directives

Fragments

Fragment identification

The Root Fragment

Example Root Fragment using the @doctype directive

Embedded fragment identifiers

Example element with an embedded identifier

Bare fragments

Range fragments

Text fragments

Whitespace

Declaring elements

Element tags

Element attributes

Element output ids

Element classes

Element text and native markup content.

Chained elements

Preformatted blocks

Comments

Embedding fragments

Templating

Templated markup

Content sanitization

Sanitizer defaults

Element specific sanitizer rules

Global settings

Directives

Example Root Fragment using the `@doctype` directive