user2879704
user2879704

Reputation:

Yaml dump for python lists uses inline format instead of hypen + space

I have a python ordered dictionary like,

from collections import OrderedDict 
a = OrderedDict()
a['name'] = 'hello'
a['msgs'] = ['hello', 'world']

And I am converting it to YAML syntax as,

import yaml
with open("b.yaml", 'w') as stream:
  stream.write(yaml.dump(a))

It prints,

!!python/object/apply:collections.OrderedDict
- - [name, hello]
  - - msgs
    - [hello, world]

Whereas, I expected a more simpler YAML format as,

name : hello
msgs:
   - hello
   - world

How can I force YAML to print list items with hypen + space notation instead of JSON like [a,b,c,d] notation?

Why does PyYAML print an Ordered dict item as [name, hello] and not as name : hello?

Upvotes: 3

Views: 7650

Answers (2)

Anthon
Anthon

Reputation: 76578

If you don't read the YAML specification you would expect mappings in YAML files to be ordered, as the text representation in a YAML file is ordered. Unfortunately this intuitive assumption is false, the YAML 1.2 explicitly states that this [should be interpreted as] an unordered set of key: value pairs.

This of course makes comparing YAML files using tools like diff virtually impossible if you use mappings and load/change/dump them, and makes checking these kind of file into revision control systems result in spurious extra revision that are semantically the same, but not syntactically.

I set out to improve PyYAML for other reasons as well (YAML 1.2 compatibility instead of the old 1.1 spec, preservation of comments, bug fixes), but ruamel.yaml also preserves ordering as mappings if you use its round_trip_dump:

import ruamel.yaml
from ruamel.yaml.comments import CommentedMap as OrderedDict

a = OrderedDict()
a['name'] = 'hello'
a['msgs'] = ['hello', 'world']

with open("b.yaml", 'w') as stream:
    ruamel.yaml.round_trip_dump(a, stream, indent=5, block_seq_indent=3)

which gives you a file b.yaml with content:

name : hello
msgs:
   - hello
   - world

which is exactly what you expected.

Please note that I passed in the stream to the round_trip_dump, if you use PyYAML you should do this as well, as it is more efficient.
You need to use the CommentedMap, which is just a thin wrapper around OrderedDict/ordereddict that allows preservation of comments etc.
Default indent is 2 and block_seq_indent is 0.

If you load your file using round_trip_dump, you'll get a CommentedMap again in which the order of the keys will be as expected.

Upvotes: 2

metatoaster
metatoaster

Reputation: 18898

Your question is confusing a few things together. Starting with your initial example, you need explicit coding to cast your a = {...} to an OrderedDict. Take that aside, this is your expect output:

>>> a = {
...   "name" : 'hello',
...   "msgs" : ['hello', 'world']
... }
>>> print(yaml.dump(a))
msgs: [hello, world]
name: hello

Which isn't exactly what you wanted. If you read the FAQ specifically on this issue, you will find that passing in default_flow_style to dump will yield your desired results

>>> print(yaml.dump(a, default_flow_style=False))
msgs:
- hello
- world
name: hello

As for why OrderedDict coming out like that, this is discussed in the section YAML tags and Python types in the documentation. In short, this is done in consideration of Python's pickle protocol, and since OrderedDict are lists internally (which lists are ordered; dicts are unordered by definition), it gets the list-like representation.

Upvotes: 5

Related Questions