2023-08-16

The Developer's Guide To YAML

Everything you need to know to get started with YAML - examples, config files, serialization and more.


Example YAML

YAML (YAML Ain't Markup Language) is a human readable data serialization language frequently used to save application state, create configuration files, and transmit data. YAML uses Python-style indents to denote nested structures leading to a tidy language that's more readable than JSON - often why it is chosen for configuration files.

How is YAML structured?

A YAML file begins with three dashes '---' followed by the content of the document. Before the dashes you can add in the YAML version number and define custom tags, but we won't worry about that for now. YAML has three basic primitives: mappings, often referred to as a dictionary, hash map, hash table, etc.; sequences, commonly known as lists, vectors, or arrays; and scalars which can be strings or a numeric type. YAML also includes comments which can be declared with '#' and used to describe your data.

To make YAML more readable, mappings and sequences can be written as block collections or with with curly braces {} and square brackets [] respectively. For example, a block sequence may look like this.

---
- A
- Block
- Sequence

But a valid sequence may also look like this.

---
[
  "not a",
  "block",
  "Sequence"
]

The same is then true for mappings.

---
# A block map.
a: 0.3 
block: 0.4 
map: hi  
---
# Not a block map.
{a: 0.3, json-style: 0.4, map: "hi"}

Most of the time, you'll want to use block sequences and mappings as they are generally easier to read, especially as you can use indentation to combine and nest these structures. You may have also noticed the syntax can be very similar to JSON. YAML is actually a superset of JSON! So any JSON you may have can be thrown into a YAML file and technically be valid, though it may be rather messy so I'd recommend converting it first.

YAML Types

As mentioned above, YAML is a superset of JSON so it supports all the types JSON does and more, albeit with different naming conventions.

Below I've included a sample YAML document which utilises all of these different types.

YAML Examples

---
# This entire YAML is a block mapping containing examples of the available types.

# An example of a block mapping.
strings:
  simple: This is a string
  quoted: 'A simple \quoted\ string.'
  escaped: "I'm an escaped string."
  literal: |
            This is a multiline string.
            Newlines are preserved. A literal string.
 folded: >
          This is a multiline string where
          newlines become spaces. A folded string.

# An example of a block sequence.
boolean:
  - true
  - false

mapping: {"mapping":"JSON style"}
sequence: ["a", "JSON", "style", "sequence"]

null: null

integers: 
  canonical_integer: 9876
  decimal: +9876
  octal: 0o14
  hexadecimal: 0x7B

floats:
  nan: .nan
  inf: .inf
  canonical_float: 5.62345+3
  exponential: 56.2345e+02

datetime:
  date: 2023-08-30
  canonical: 2023-08-30T12:29:11.1Z
  iso8601: 2023-08-30t12:29:11.10-01:00
  spaced: 2023-08-30 12:29:11.10 -1

Configuration Files

Because of YAML's readability, wide support for dates and different number representations, and the fact JSON doesn't support comments, YAML is the top choice for configuration files. In particular, Continuous Integration and Continuous Development (CI/CD) pipelines use YAML constantly to ensure servers are deployed in an automated and reproducible manner each and every time. For instance, Docker Compose files are YAML based as are GitHub Actions, Jenkins, and AWS CloudFormation. A simple configuration file from the Docker Compose documentation is shown below.

services:
  web:
    build: .
    ports:
      - "8000:5000"
    volumes:
      - .:/code
    environment:
      FLASK_DEBUG: "true"
  redis:
    image: "redis:alpine"

And here's a more complex example, also taken from the Docker Compose documentation.

services:
  db:
    image: postgres
    volumes:
      - ./data/db:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
  web:
    build: .
    command: python manage.py runserver 0.0.0.0:8000
    volumes:
      - .:/code
    ports:
      - "8000:8000"
    environment:
      - POSTGRES_NAME=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    depends_on:
      - db

Stringify, Parse, Serialize, and Deserialize

Having seen some examples, you may now be wondering how you actually start using YAML in your projects. You'll want to begin by finding a package that can parse or deserialize your YAML file. This involves translating it from bytes read in the file to a usable structure in your chosen language. You'll often find that in interpreted languages it will be converted to a dictionary type while in strongly typed, compiled languages it becomes an enumeration. Let's go over a quick example in Python which uses a YAML configuration file to calculate a value.

should_calculate: True
polynomial:
  a: 0.003
  b: 0.6
  c: 0.4
import yaml # ('pip install PyYAML')

GLOBAL_CONFIG = {}


def setup():
    with open('configuration.yaml', 'r+') as yaml_file:
        global GLOBAL_CONFIG 
        GLOBAL_CONFIG = yaml.safe_load(yaml_file)
        

def calculate():
    
    x = 2.0
    my_poly_coeffs = GLOBAL_CONFIG['polynomial']
    y = my_poly_coeffs['a'] * x**2 + my_poly_coeffs['b'] * x + my_poly_coeffs['c']
    
    return y

if __name__ == "__main__":
    
    setup()
   
    if GLOBAL_CONFIG['should_calculate']:
        answer = calculate()
        print(answer)

As you can see, once loaded with the PyYAML package the configuration you supplied can be used like a normal Python dictionary - a data structure that can be completely generic when you run your code. Let's take a look at doing this in a strongly typed language now. Using an example from the serde_yaml documentation we can create YAML files from a struct (stringify/serialize) and then recreate the struct from the YAML string (parse/deserialize).

use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize, PartialEq, Debug)]
struct Point {
    x: f64,
    y: f64,
}

fn main() -> Result<(), serde_yaml::Error> {
    let point = Point { x: 1.0, y: 2.0 };

    let yaml = serde_yaml::to_string(&point)?;
    assert_eq!(yaml, "x: 1.0\ny: 2.0\n");

    let deserialized_point: Point = serde_yaml::from_str(&yaml)?;
    assert_eq!(point, deserialized_point);
    Ok(())
}

This is great for when you have a well defined configuration file which won't be changing much. But what if you're receiving configuration files from hundreds of people and need a way to make it completely generic? You can use enumerations! The Value enumeration within the serde_yaml crate enables you to hold a complete generic structure, as the mapping and sequence types are just hashmaps and vectors containing more of the Value enumeration.

pub enum Value {
    Null,
    Bool(bool),
    Number(Number),
    String(String),
    Sequence(Sequence),
    Mapping(Mapping),
    Tagged(Box<TaggedValue>),
}

With this structure in place, you can infinitely nest your data and theoretically store any structure you wish.

Useful Libraries

Most languages will have packages already developed to help you handle the YAML format. Below I've included the packages for the some popular languages.

If this list doesn't have what you're looking for, check out the YAML homepage or this handy GitHub page which contains a list of useful information.

Useful Tools

I've created a couple of web tools which help you convert, format, and validate YAML, as sometimes opening up your web-browser to do a small task is just easier than writing a custom script!

Thanks for taking the time to read this article. You can also find this post on Medium. Next up I'll be diving into the TOML format!



Software
YAML