MAPLUZ
MAPLUZ

Reputation: 21

factoring mapping key in a yaml file

I have this YAML file :

pb:
  {EF:{16, 19}, EH:{16, 19}}

when I apply my flattendict Python function, I get this

{('pb', 'EF', 16): None,
 ('pb', 'EF', 19): None,
 ('pb', 'EH', 16): None,
 ('pb', 'EH', 19): None}

I search the syntax of my YAML file as below, to get the same result (I want factoring my YAML node data)

pb:
  {EF, EH}, {16, 19}}

Have you an idea?

Here my python flattendict function

#!/usr/bin/env python
#encoding: UTF-8
import codecs
import sys
import yaml
import pprint

import collections

from collections import Mapping
from itertools import chain
from operator import add

_FLAG_FIRST = object()

def flattenDict(d, join=add, lift=lambda x:x):
    results = []
    def visit(subdict, results, partialKey):
        for k,v in subdict.items():
            newKey = lift(k) if partialKey==_FLAG_FIRST else join(partialKey,lift(k))
            if isinstance(v,Mapping):
                visit(v, results, newKey)
            else:
                results.append((newKey,v))
    visit(d, results, _FLAG_FIRST)
    return results

testdata = yaml.safe_load(open('data.yaml', 'r')) 
from pprint import pprint as pp
result = flattenDict(testdata, lift=lambda x:(x,))
pp(dict(result))

Upvotes: 1

Views: 1264

Answers (1)

Anthon
Anthon

Reputation: 76902

In YAML you can have a complex flow node, even in a simple key (i.e. without ?, markup). This is so in both YAML 1.2 and YAML 1.1. That means that this:

{a: 1, b: 2}: mapping
[1, 2, a]: sequence

is correct YAML.

The problem is that a mapping normally loads as a Python dict and a sequence as a Python list, both of which are mutable, cannot be hashed, and are not allowed as keys for a Python dict (try executing python -c "{{'a': 1}: 2}").

PyYAML (which supports YAML 1.1) errors out on both of those lines.

Since Python has an immutable list in the form of tuple, I decided to implement loading of sequence keys in Python by constructing them as tuples in ruamel.yaml (which supports YAML 1.2 and YAML 1.1). So the following works:

import sys
import ruamel.yaml
from pprint import pprint as pp

yaml_str = """\
[pb, EF, 16]: 
[pb, EF, 19]: 
[pb, EH, 16]: 
[pb, EH, 19]: 
"""


yaml = ruamel.yaml.YAML(typ='rt')
# yaml.indent(mapping=4, sequence=4, offset=2)
# yaml.preserve_quotes = True
data = yaml.load(yaml_str)

pp(data)
print('---------')
yaml.dump(data, sys.stdout)

printing:

{('pb', 'EF', 16): None,
 ('pb', 'EF', 19): None,
 ('pb', 'EH', 16): None,
 ('pb', 'EH', 19): None}
---------
[pb, EF, 16]:
[pb, EF, 19]:
[pb, EH, 16]:
[pb, EH, 19]:

If you try to load the above YAML in PyYAML it throws an exception:

found unhashable key
  in "<unicode string>", line 1, column 1:
    [pb, EF, 16]: 

Notes:

  • If you don't want to round-trip, use typ="safe", it uses the faster C-loader, that also handles keys-that-are-sequences, but it doesn't as smartly dump those back, resulting in ? marked explicit keys.

  • A proposal for a frozendict for Python, did not get accepted, so there is no equivalent, not even in the standard library for a dict what tuple is for a list, and ruamel.yaml doesn't support mappings as keys out of the box. You can of course add this to ruamel.yaml's Constructor if you have such a frozendict.

  • Although there is a frozenset in Python, and a set in YAML, ruamel.yaml does not currently accept the following as input:

    !!set {a , b}: value
    
  • Probably needless to say: you cannot change the elements of such a key programmatically without deleting and re-adding the key-value pair.

Upvotes: 1

Related Questions