← Home ← Code

Strict YAML

YAML is great. I like writing YAML more than JSON, but sometimes, I feel that YAML has gone a tad too far. Its various features sometimes make it hard to keep track of what is allowed and what is not. For example:

- question: How cool is YAML?
  answer: YAML is pretty awesome :)
- question: Do you think YAML is perfect?
  answer: No

There are two mistakes here; first of all, the colon within the :) throws off the parser, since it expects it to separate key-value pairs, whereas we intended for it to be part of the string. To fix it, we would need to enclose the entire value in quotes to make sure it is parsed as a string. The second mistake is a similar misinterpretation issue; the No is interpreted as an alias for false. We wanted string key-value pairs, but now when we try to render them into a document we get a surprising false printed.

Is that an issue?

For seasoned developers, none of it is really problematic. Worst case scenario, an oversight happens, it fails to parse, fix it, move on.

But for beginners, it's a bad experience. Indentation with spaces only, all these intricacies that you must know. It is intended to be human-readable and writable, but people who are not comfortable writing code will have a bad time writing YAML, too. In a way, JSON may not be the most ergonomic language to read and write, but it sure is simpler than YAML.

The primary issue here is that the wide range of features (like quoteless values, aliases for booleans, multiline strings with whitespace control, etcetera) are simultaneously simplifying and complicating things. Many of the features YAML has don't really have to exist, that is to say, removing them from the YAML specification would have no bearing on the actual capabilities of the language.

Strict YAML

In comes Strict YAML. This is a so-far hypothetical language I've imagined. It strips down YAML to the barest of bones, into a language that is less featureful but simpler to understand the rules of. And, naturally, it's a subset of YAML; in other words, Strict YAML is always valid YAML.

Hypothetically, Strict YAML could just be JSON. JSON is already technically a valid subset of YAML. But still, YAML has some features that I think are rather nice especially for use in places like a static site generator. One of the features I've come to like quite a lot is multiline (unquoted) strings, which are a great altenative to {% block … %} syntax Liquid and Nunjucks have. Being able to write readable content-like data within the front matter of a template in YAML is great and I feel Strict YAML should have it, too. On the other hand, I don't care much for (and have never used) aliases for booleans; I've always just stuck to true and false for clarity. Similarly, Strict YAML can restrict the allowed format for dates to always be YYYY-MM-DDThh:mm:ss.sss (or a subset of it, though it should always have at least a year and a month specified). For string values, I actually feel like, apart from the quoteless multiline string syntax, requiring quotes around single-line string would be fine and create more consistency. So, we can then do

- question: "How cool is YAML?"
  short_answer: "YAML is pretty awesome :)"
  long_answer: |
    YAML is great, but its many features
    can _sometimes_ create confusion.

Also, we need to address the topic of indentation. Personally, I find that the only sensible amount of indentation for YAML (and this applies to YAML specifically) is two spaces. Tabs are out of the question by specification, and arrays are in a way indented with an indent width 2 already, since they indent with - . When using 4 spaces for indentation (or any other amount), I find myself struggling with arrays. Do I indent arrays with two spaces, a dash, and a space? If I choose that, then it feels like "normal" 4-space indentation but the third space is arbitrarily replaced with a dash. I then cannot use my tab key, because it inserts 4 spaces. The alternative is 4 spaces and then the - , but that ends up looking very inconsistent because the objects in the array appear to be indented with 6 spaces, even though I was using 4 spaces for indentation. The nested keys then are off by two spaces, causing me to have to manually manage the spaces in my indentation.

Perhaps this is a tooling issue. But even if my editor did it "right", I wouldn't necessarily prefer the theoretical implications, no matter what "right" would mean. With two spaces, these problems more-or-less vanish. - no longer really feels like indentation (I don't have to mash the spacebar or even press the tab key).

Strict YAML rules

First, indent with two spaces only. Then, primitive values such as numbers, booleans, strings, and null, follow the rules of JSON primitives; whatever the value, JSON can parse it (correctly). The exception is NaN, which YAML denotes as .NaN, and plus and minus infinity, which it denotes .Inf and -.Inf. These values remain valid in Strict YAML.

Multiline strings with the | syntax are permitted, but > is not, and neither are the whitespace control markers - and + (as in e.g. >-). They are simply not necessary. For granular whitespace control, use the quoted string syntax with \n for newlines.

Dates follow the ISO 8601 format: YYYY-MM-DDThh:mm:ss.sss. It must at least specify the year and month (just specifying the year results in a number, not a date), every subsequent specifier is optional.

Objects are written very similarly to original YAML, with keys not requiring quotes. For simplicity, quotes are only optional if keys start with a letter (A through Z, case-insensitively) or an underscore.

Arrays use the - syntax, and must not be indented further than their parent key. That is, the key must start on the same line as the dashes. This is to create more consistency in indentation.

Lastly, comments; they must appear by themselves on a line (potentially indented). They are treated as an empty newline. Naturally, it is not possible to use them inside multiline strings.

All in all, here's an example of Strict YAML:

example:
  number_array:
  - -.Inf
  - -23
  - .5
  - 1.2e+5
  - .Inf
  - .NaN
  is_simpler_than_yaml: "For sure"
  "~~weird keys possible~~": true
  nesting:
    valid_since: 2024-09-01T15
    # Comments also fine
    multiline: |
      Of course. But,
      # this is not a comment.
  lastly: {"json": {"isAllowed": true}}

Now what?

This is a fictitous language. So far. But, sooner or later, I'll write a parser for this. Perhaps I'll share some of the process along the way.