Reputation: 31
Setup:
I have two YAML files: one huge with tags and aliases, and the other one small with some key value pairs from the big one. I'm using python2.7.
Problem:
I want to update values in the big one with the values present in the small one.
Challenge:
Small yaml can contain any combination of the key-value pairs existing in the big one. I also have to preserve literal structure of the big one (not resolve tags/aliases). Big one is complicated, containing dictionaries of dictionaries of lists of dictionaries (don't ask...). Is that even possible?
Especially for things like:
resources: &something
that_thing: some_stuff
some_other_stuff: this_thing
For which i want to get for example:
resources: &something
that_thing: some_stuff
some_other_stuff: this_thing_updated
As that does not fit into a dict nicely (I think?)
Upvotes: 3
Views: 1480
Reputation: 76578
If the keys of the small file are unique in the big file, it is relatively simple to walk data structure of the big file and update its values if the keys:
import sys
import ruamel.yaml
big_yaml = """\
resources: &something
that_thing: !SomeStuff
a:
- 1
- some_stuff: d
b: *something
some_other_stuff: this_thing
"""
small_yaml = """\
some_stuff: 42
some_other_stuff: 'the_other_thing'
"""
def walk_tree(d, update, done=set()):
if not update:
return
if id(d) in done:
return
if isinstance(d, dict):
done.add(id(d))
for k in d:
if k in update:
d[k] = update.pop(k)
continue # don't recurse in the newly updated value
walk_tree(d[k], update) # recurse into the values
elif isinstance(d, list):
done.add(id(d))
for elem in d:
walk_tree(elem, update)
# doing nothing for leaf-node scalars
yaml = ruamel.yaml.YAML()
# yaml.indent(mapping=2, sequence=2, offset=0)
yaml.preserve_quotes = True
big = yaml.load(big_yaml)
small = yaml.load(small_yaml)
# print(data)
walk_tree(big, small)
yaml.dump(big, sys.stdout)
which gives:
resources: &something
that_thing: !SomeStuff
a:
- 1
- some_stuff: 42
b: *something
some_other_stuff: 'the_other_thing'
Please note that:
id
s of the nodes, to prevent infinite recursion. That is of course only necessary if your big data is recursive, but it can't hurt.pop
the values of update
, so the small file becomes empty and recursion stops early. If you want to update all matching keys, then don't pop
, just assign (and then you can remove the first two lines from walk_tree
)!Somestuff
class. It's a kind of magic!ruamel.yaml
can probably be patched to always dump the anchor (in general, when also adding new elements that are represented multiple times in the data tree, that might cause conflicts and needs extra checking)yaml.indent
line and adjust the values (to match the big file)Alternatively you can have a small YAML looking like:
[resources, that_thing, a, 1, some_stuff]: 42
[resources, some_other_stuff]: 'the_other_thing'
then you can have deterministic recursion into the datastructure based on the keys sequence elements, doing away with checking for ids and dragging along update
(just pass the value in as second parameter of your walk_tree
).
If all your updating is at the top-level of the big file. Then none of that recursion is necessary, as that is just a simple corner case of the above.
Upvotes: 2