Syard

 

Note

This is work in progress and not yet finished.

Syard

Introduction

The Syard file format is a simple, human readable and human writable file format for non-hierarchical, record-based data.

Its main advantages over CSV are:
  • it works well for long data and many rows
  • it doesn’t need escaping rules for special characters.
  • it comes with support for different character encodings

Syard is the successor of the Shipyard data format. Syard is simpler and better defined (yet less flexible) than Shipyard and does not depend on a specific programming language.

Conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119

File format

This document describes the Syard file format version 0.1 (“Syard v0.1”)

A Syard v0.1 file consists of

  • a file header specifying the format version an encoding
  • zero or more data record separated by empty lines
  • comment lines

Lines

A line is a sequence of characters terminated by the platform’s end of line character sequence.

A Syard parser MAY support other platform’s end of line character sequences.

A Syard parser MUST support lines up to a length of 255 characters (including the end of line sequence). It MAY support longer lines.

Empty lines

A empty line is a line that only contains zero or more white space characters.

Any Syard reader MUST at least recognise space and tab characters as whitespace. It MAY recognise other characters as whitespace if and only if they are considered to be whitespace by the encoding defined by the header.

The end of a file MUST be interpreted as an empty line.

File header

Every Syard file MUST start with a header line as the first line in the file. The header line MUST be:

!SYARD v$VERSION -*- coding: $ENCODING -*-

where $VERSION specifies the used version of the Syard specification and $ENCODING the character encoding used for that file. $VERSION consists of one or more alphanumeric characters and dots, $ENCODING consists of one or more non-whitespace characters.

The leading ! MUST NOT be preceded by any other character (including white space).

A Syard parser MAY refuse to read files with a higher version number than it is written for. It MUST signal an error as soon as it reads a line it can not understand.

Any Syard parser MUST understand at least the encoding utf-8. It MAY recognise other encodings. It MAY refuse to read files with an encoding it doesn’t understand. It MUST signal an error as soon as it reads a line containing a character it does not understand.

An example for a header line for a UTF-8 encoded Syard v0.1 file is:

!SYARD v0.1 -*- coding: utf-8 -*-

Note

Emacs automatically recognises the encoding of Syard files. Maybe other editors do, too.

Data records

A data record consists of one or more data fields. Data records are separated by one or more empty lines. The empty lines are not part of the data record. Multiple consecutive empty lines MUST be interpreted as one empty line.

Data fields

A data field consists of a field name followed by the field separator and the field value.

The field name MUST NOT start with a whitespace character, number sign (#) or exclamation mark (!), MUST NOT contain colons (:) and MUST be at least 1 character long. A Syard reader MUST support field names up to a length of 100 characters. It MAY support longer field names.

the field separator is exactly one colon (:) and exactly one space character (other whitespace characters are not allowed).

The field value is anything (including trailing whitespace) after (not including) the field separator and zero or more following continuation lines.

The field value MAY be empty. A Syard reader MUST support field values up to a length of 10240 characters. It MAY support longer field values.

A continuation line is a line starting with a space character (other whitespace characters are not allowed). Its content (not including the first space character) is appended to the last field value of the current data record.

A Syrad reader MUST signal an error if there is a continuation line without a data field it belongs to (i.e. before the first data record or immediate after an empty line)

Comment lines

A comment line is a line starting with # as the first character. Comment lines are allowed everywhere after the header line.

A Syard reader MUST ignore all comment lines.