Module shipyard
[hide private]
[frames] | no frames]

Module shipyard

source code

Shipyard is a module to process data in a format inspired by email headers (RFC 2822).

File format

Character encoding

A character encoding can be specified similar to PEP 0263 using:

# -*- coding: <encoding name> -*-

in the first line. # is replaced with the actual comment mark.

More precisely, the first line must match the regular expression:

^#.*coding[:=]\s*([-\w.]+)

Again # is replaced by the actual comment mark. The first group of this expression is then interpreted as encoding name.

Data set

A data set consists of zero or more records separated by one or more empty lines.

Comment

Lines starting with the comment mark (default: #) are ignored. Comments can be used in or between records.

Record

A record consists of one or more fields

Field

A field is a line that has the form:

key: value
key is a string that
  • doesn't contain a colon
  • doesn't start with the comment mark
  • doesn't start with the continuation mark

value is an arbitrary string. It can span multiple line using continuation marks.

Continuation

If a line starts with the continuation mark (default: " " [one blank]) it gets appended to the preceding line, with the continuation mark removed.

Usage

Obviously we need to import shipyard:
>>> import shipyard
First we open the file:
>>> input = open('nobel.sy')
Then we create a parser object:
>>> reader = shipyard.Parser(keep_linebreaks=False,
...                          keys=['id', 'discipline', 'year',
...                                'name', 'country', 'rationale'])

For every record the given keys are initialized with None.

Now we can iterater through the records:

>>> for record in reader.parse(input):    # doctest:+ELLIPSIS
...     print record['country']
United States
Japan
United States
...
Instead of iterating we may want to get a list of dicts:
>>> input.seek(0)
>>> lod = reader.get_list(input)
>>> print lod     # doctest:+ELLIPSIS
[{u'discipline': u'Chemistry', u'name': u'Martin Chalfie', ...}, {u'discipline': u'Chemistry', u'name': u'Osamu Shimomura', ...}, ...]
Sometimes we need a dict of dicts (using the 'id' field as key):
>>> input.seek(0)
>>> dod = reader.get_dict(input, key='id')
>>> print dod.keys()
[u'11', u'10', u'1', u'0', u'3', u'2', u'5', u'4', u'7', u'6', u'9', u'8']
>>> print dod[u'5'][u'rationale']
for the discovery of the mechanism of spontaneous brokensymmetry in subatomic physics
If we don't want dicts we can use the 'factory' parameter:
>>> input.seek(0)
>>> los = reader.get_list(input, factory = lambda **keys: ', '.join(keys.values()))
>>> print los[0]
Chemistry, Martin Chalfie, United States, for the discovery and development of the green fluorescentprotein, GFP, 2008, 0
Of course a class works as a factory, too:
>>> input.seek(0)
>>> class Laureate(object):
...     def __init__(self, id, discipline, year, name, country, rationale):
...         self.name = name
>>> doo = reader.get_dict(input, key='id', factory = Laureate)
>>> print doo[u'2']      # doctest:+ELLIPSIS
<Laureate object at ...>
>>> print doo[u'2'].name
Roger Y. Tsien

Now let's write a Shipyard file.

First we create a StringIO (any other file-like object will do, too):
>>> import StringIO
>>> output = StringIO.StringIO()
Next we need a Writer object:
>>> writer = shipyard.Writer(keys=('foo', 'bar'), coding='utf-8')
Now we can use write() to write a single record:
>>> writer.write(output, {'foo': 1, 'bar': 2})
>>> print output.getvalue()
foo: 1
bar: 2
<BLANKLINE>
<BLANKLINE>
Using write_many() we can write a list of records:
>>> output = StringIO.StringIO()
>>> d = [dict((('foo', i), ('bar', 2*i))) for i in range(3)]
>>> writer.write_many(output, d)
>>> print output.getvalue()
foo: 0
bar: 0
<BLANKLINE>
foo: 1
bar: 2
<BLANKLINE>
foo: 2
bar: 4
<BLANKLINE>
<BLANKLINE>
To get a encoding line we use write_coding():
>>> output = StringIO.StringIO()
>>> writer.write_coding(output)
>>> print output.getvalue()
#-*- coding: utf-8 -*-
<BLANKLINE>
<BLANKLINE>
Now let's do everything at once using write_full():
>>> output = StringIO.StringIO()
>>> writer.write_full(output, d)
>>> print output.getvalue()
#-*- coding: utf-8 -*-
<BLANKLINE>
foo: 0
bar: 0
<BLANKLINE>
foo: 1
bar: 2
<BLANKLINE>
foo: 2
bar: 4
<BLANKLINE>
<BLANKLINE>


Classes [hide private]
  InvalidLineError
Something is wrong with a line
  InvalidKeyError
Something is wrong with a key
  Parser
Reader for Shipyard files
  Writer
Writer for Shipyard files