user9144393
user9144393

Reputation: 31

Updating a yaml file with the contents of another one

Setup:
I have two YAML files: one huge with tags and aliases, and the other one small with some key value pairs from the big one. I'm using python2.7.

Problem:
I want to update values in the big one with the values present in the small one.

Challenge:
Small yaml can contain any combination of the key-value pairs existing in the big one. I also have to preserve literal structure of the big one (not resolve tags/aliases). Big one is complicated, containing dictionaries of dictionaries of lists of dictionaries (don't ask...). Is that even possible?

Especially for things like:

resources: &something
   that_thing: some_stuff
   some_other_stuff: this_thing

For which i want to get for example:

resources: &something
   that_thing: some_stuff
   some_other_stuff: this_thing_updated

As that does not fit into a dict nicely (I think?)

Upvotes: 3

Views: 1480

Answers (1)

Anthon
Anthon

Reputation: 76578

If the keys of the small file are unique in the big file, it is relatively simple to walk data structure of the big file and update its values if the keys:

import sys
import ruamel.yaml

big_yaml = """\
resources: &something
   that_thing: !SomeStuff 
      a:
      - 1
      - some_stuff: d
      b: *something
   some_other_stuff: this_thing
"""

small_yaml = """\
some_stuff: 42
some_other_stuff: 'the_other_thing'
"""

def walk_tree(d, update, done=set()):
    if not update:
        return
    if id(d) in done:
        return
    if isinstance(d, dict):
        done.add(id(d))
        for k in d:
            if k in update:
                d[k] = update.pop(k)
                continue  # don't recurse in the newly updated value
            walk_tree(d[k], update)  # recurse into the values
    elif isinstance(d, list):
        done.add(id(d))
        for elem in d:
            walk_tree(elem, update)
    # doing nothing for leaf-node scalars


yaml = ruamel.yaml.YAML()
# yaml.indent(mapping=2, sequence=2, offset=0)
yaml.preserve_quotes = True
big = yaml.load(big_yaml)
small = yaml.load(small_yaml)
# print(data)
walk_tree(big, small)

yaml.dump(big, sys.stdout)

which gives:

resources: &something
  that_thing: !SomeStuff
    a:
    - 1
    - some_stuff: 42
    b: *something
  some_other_stuff: 'the_other_thing'

Please note that:

  • you need to keep the set of ids of the nodes, to prevent infinite recursion. That is of course only necessary if your big data is recursive, but it can't hurt.
  • I pop the values of update, so the small file becomes empty and recursion stops early. If you want to update all matching keys, then don't pop, just assign (and then you can remove the first two lines from walk_tree)
  • the tag is preserved, even though the program knows nothing about how to create a !Somestuff class. It's a kind of magic!
  • Anchors are currently only dumped if there is an actual aliases referring to it. The anchor is preserved, so ruamel.yaml can probably be patched to always dump the anchor (in general, when also adding new elements that are represented multiple times in the data tree, that might cause conflicts and needs extra checking)
  • superfluous quoting around "the_other_thing" is preserved
  • your mapping resp. sequence indents need to be consistent, otherwise they get reformatted. You can uncomment the yaml.indent line and adjust the values (to match the big file)

Alternatively you can have a small YAML looking like:

[resources, that_thing, a, 1, some_stuff]: 42
[resources, some_other_stuff]: 'the_other_thing'

then you can have deterministic recursion into the datastructure based on the keys sequence elements, doing away with checking for ids and dragging along update (just pass the value in as second parameter of your walk_tree).


If all your updating is at the top-level of the big file. Then none of that recursion is necessary, as that is just a simple corner case of the above.

Upvotes: 2

Related Questions