ben w
ben w

Reputation: 2535

generate yaml anchors/references with overrides using pyyaml

A thing you can do with human-written YAML is this:

foo: &foo_anchor
  key1: v1
  key2: v2
  key3: v3
bar:
  <<: *foo_anchor
  key2: override_value

I would like to programmatically generate output like that using PyYAML. It seems tricky! By default, as far as I can tell, PyYAML only generates anchors/references when it encounters equal objects (and the order is probably not defined, whereas in this example, bar has to reference foo, not the other way around). I've tried a few things—defining a YamlReference class and checking for its tag in an overridden Dumper.serialize_node method—but trying to do something like:

        if node.tag.endswith('magic.prefix.YamlReference'):
            alias = node.value[0].value
            self.emit(yaml.events.AliasEvent(alias))
            super(Dumper, self).anchor_node(node.value[1])
            super(Dumper, self).serialize_node(node.value[1], parent, idx)

messes up the expected event stream. Is this possible?

Upvotes: 2

Views: 1758

Answers (2)

Anthon
Anthon

Reputation: 76578

One way of achieving this is making a class that holds the appropriate information about the merge information, and still allows lookup of data['bar']['key1']. Of course you need to properly dump this class with an appropriate representer.

This is what ruamel.yaml (disclaimer I am the author of that package) does to allow round-tripping of merged maps:

import sys
import ruamel.yaml

yaml_str = """\
foo: &foo_anchor
  key1: v1
  key2: v2
  key3: v3
bar:
  <<: *foo_anchor
  key2: override_value
"""

yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

which gives:

foo: &foo_anchor
  key1: v1
  key2: v2
  key3: v3
bar:
  <<: *foo_anchor
  key2: override_value

So I suggest you look at the class CommentedMap and how this is handled in constructor.py and representer.py.

If you can upgrade to ruamel.yaml then you can do:

cm = ruamel.yaml.comments.CommentedMap

data = cm()
data['foo'] = foo = cm(key1='v1', key2='v2', key3='v3')
foo.yaml_set_anchor('foo_anchor')
data['bar'] = bar = cm(key2='override_value')
bar.add_yaml_merge([(0, foo)])
yaml = ruamel.yaml.YAML()

yaml.dump(data, sys.stdout)

which gives something similar to what you expect, starting from scratch:

foo: &foo_anchor
  key1: v1
  key2: v2
  key3: v3
bar:
  <<: *foo_anchor
  key2: override_value

And of course the following works as expected:

print(list(data['bar'].keys()))
print(data['bar']['key3'])

to give:

['key2', 'key1', 'key3']
v3

Upvotes: 0

flyx
flyx

Reputation: 39638

Well you can do something like this:

import yaml

class Merger(object):
  pass

def merger_representer(dumper, data):
  return dumper.represent_scalar(u'tag:yaml.org,2002:merge', '<<')

yaml.add_representer(Merger, merger_representer)

foo = {'key1': 'v1', 'key2': 'v2', 'key3': 'v3'}

root = {
  'foo': foo,
  'bar': {
     Merger(): foo,
     'key2': 'override_value'
  }
}

print(yaml.dump(root, sort_keys=False))

Output is:

foo: &id001
  key1: v1
  key2: v2
  key3: v3
bar:
  <<: *id001
  key2: override_value

sort_keys=False ensures correct order of the keys, it requires Python >= 3.7 and PyYAML >= 5.1 (thanks @tinita). You have no control over the generated anchor name, but this YAML is equivalent to yours.

You need the Merger class to force PyYAML to emit << (with a normal string key, it would emit '<<' so that it doesn't get confused with a merge key).

Upvotes: 1

Related Questions