YAML (YAML Ain't Markup Language) is a human readable data serialization language frequently used to save application state, create configuration files, and transmit data. YAML uses Python-style indents to denote nested structures leading to a tidy language that's more readable than JSON - often why it is chosen for configuration files.
A YAML file begins with three dashes '---
' followed by the content of the document. Before the dashes you can add in the YAML version number and define custom tags, but we won't worry about that for now. YAML has three basic primitives: mappings, often referred to as a dictionary, hash map, hash table, etc.; sequences, commonly known as lists, vectors, or arrays; and scalars which can be strings or a numeric type. YAML also includes comments which can be declared with '#
' and used to describe your data.
To make YAML more readable, mappings and sequences can be written as block collections or with with curly braces {} and square brackets [] respectively. For example, a block sequence may look like this.
---
- A
- Block
- Sequence
But a valid sequence may also look like this.
---
[
"not a",
"block",
"Sequence"
]
The same is then true for mappings.
---
# A block map.
a: 0.3
block: 0.4
map: hi
---
# Not a block map.
{a: 0.3, json-style: 0.4, map: "hi"}
Most of the time, you'll want to use block sequences and mappings as they are generally easier to read, especially as you can use indentation to combine and nest these structures. You may have also noticed the syntax can be very similar to JSON. YAML is actually a superset of JSON! So any JSON you may have can be thrown into a YAML file and technically be valid, though it may be rather messy so I'd recommend converting it first.
As mentioned above, YAML is a superset of JSON so it supports all the types JSON does and more, albeit with different naming conventions.
Mapping
Sequence
String
Numeric
Boolean
Null
Below I've included a sample YAML document which utilises all of these different types.
---
# This entire YAML is a block mapping containing examples of the available types.
# An example of a block mapping.
strings:
simple: This is a string
quoted: 'A simple \quoted\ string.'
escaped: "I'm an escaped string."
literal: |
This is a multiline string.
Newlines are preserved. A literal string.
folded: >
This is a multiline string where
newlines become spaces. A folded string.
# An example of a block sequence.
boolean:
- true
- false
mapping: {"mapping":"JSON style"}
sequence: ["a", "JSON", "style", "sequence"]
null: null
integers:
canonical_integer: 9876
decimal: +9876
octal: 0o14
hexadecimal: 0x7B
floats:
nan: .nan
inf: .inf
canonical_float: 5.62345+3
exponential: 56.2345e+02
datetime:
date: 2023-08-30
canonical: 2023-08-30T12:29:11.1Z
iso8601: 2023-08-30t12:29:11.10-01:00
spaced: 2023-08-30 12:29:11.10 -1
Because of YAML's readability, wide support for dates and different number representations, and the fact JSON doesn't support comments, YAML is the top choice for configuration files. In particular, Continuous Integration and Continuous Development (CI/CD) pipelines use YAML constantly to ensure servers are deployed in an automated and reproducible manner each and every time. For instance, Docker Compose files are YAML based as are GitHub Actions, Jenkins, and AWS CloudFormation. A simple configuration file from the Docker Compose documentation is shown below.
services:
web:
build: .
ports:
- "8000:5000"
volumes:
- .:/code
environment:
FLASK_DEBUG: "true"
redis:
image: "redis:alpine"
And here's a more complex example, also taken from the Docker Compose documentation.
services:
db:
image: postgres
volumes:
- ./data/db:/var/lib/postgresql/data
environment:
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
web:
build: .
command: python manage.py runserver 0.0.0.0:8000
volumes:
- .:/code
ports:
- "8000:8000"
environment:
- POSTGRES_NAME=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
depends_on:
- db
Having seen some examples, you may now be wondering how you actually start using YAML in your projects. You'll want to begin by finding a package that can parse or deserialize your YAML file. This involves translating it from bytes read in the file to a usable structure in your chosen language. You'll often find that in interpreted languages it will be converted to a dictionary type while in strongly typed, compiled languages it becomes an enumeration. Let's go over a quick example in Python which uses a YAML configuration file to calculate a value.
should_calculate: True
polynomial:
a: 0.003
b: 0.6
c: 0.4
import yaml # ('pip install PyYAML')
GLOBAL_CONFIG = {}
def setup():
with open('configuration.yaml', 'r+') as yaml_file:
global GLOBAL_CONFIG
GLOBAL_CONFIG = yaml.safe_load(yaml_file)
def calculate():
x = 2.0
my_poly_coeffs = GLOBAL_CONFIG['polynomial']
y = my_poly_coeffs['a'] * x**2 + my_poly_coeffs['b'] * x + my_poly_coeffs['c']
return y
if __name__ == "__main__":
setup()
if GLOBAL_CONFIG['should_calculate']:
answer = calculate()
print(answer)
As you can see, once loaded with the PyYAML package the configuration you supplied can be used like a normal Python dictionary - a data structure that can be completely generic when you run your code. Let's take a look at doing this in a strongly typed language now. Using an example from the serde_yaml documentation we can create YAML files from a struct (stringify/serialize) and then recreate the struct from the YAML string (parse/deserialize).
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize, PartialEq, Debug)]
struct Point {
x: f64,
y: f64,
}
fn main() -> Result<(), serde_yaml::Error> {
let point = Point { x: 1.0, y: 2.0 };
let yaml = serde_yaml::to_string(&point)?;
assert_eq!(yaml, "x: 1.0\ny: 2.0\n");
let deserialized_point: Point = serde_yaml::from_str(&yaml)?;
assert_eq!(point, deserialized_point);
Ok(())
}
This is great for when you have a well defined configuration file which won't be changing much. But what if you're receiving configuration files from hundreds of people and need a way to make it completely generic? You can use enumerations! The Value enumeration within the serde_yaml crate enables you to hold a complete generic structure, as the mapping and sequence types are just hashmaps and vectors containing more of the Value enumeration.
pub enum Value {
Null,
Bool(bool),
Number(Number),
String(String),
Sequence(Sequence),
Mapping(Mapping),
Tagged(Box<TaggedValue>),
}
With this structure in place, you can infinitely nest your data and theoretically store any structure you wish.
Most languages will have packages already developed to help you handle the YAML format. Below I've included the packages for the some popular languages.
Javascript: yaml
Python: PyYAML
C: libcyaml
C++: yaml-cpp
C#: YamlDotNet
Go: Go-yaml
Fortran: fortran-yaml
Java: yamlbeans
Haskell: HsYAML
Kotlin: kaml
Rust: serde_yaml
Swift: Yams
If this list doesn't have what you're looking for, check out the YAML homepage or this handy GitHub page which contains a list of useful information.
I've created a couple of web tools which help you convert, format, and validate YAML, as sometimes opening up your web-browser to do a small task is just easier than writing a custom script!
Thanks for taking the time to read this article. You can also find this post on Medium. Next up I'll be diving into the TOML format!