Reputation: 7831
I have some example YAML that looks like this:
type:
- name: foo
location: bar
releases:
- name: app1
sha1: 11b318d4ec9f0baf75d8afc6f78cf66f955d459f
url: https://url.com/app.tar.gz
- name: app2
sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
url: https://url.com/app2.tar.gz
jobs:
- instances: 1
name: appname
templates:
- name: postgres
release: 1.0
I want to merge in a YAML file that adds to that that might look like
releases:
- name: app3
sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
url: https://url.com/app3.tar.gz
jobs:
-
templates:
- name: mysql
release: 1.0
I've tried to convert them to a dict
and then merging them together but that didn't work at all.
The end should look like
type:
- name: foo
location: bar
releases:
- name: app1
sha1: 11b318d4ec9f0baf75d8afc6f78cf66f955d459f
url: https://url.com/app.tar.gz
- name: app2
sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
url: https://url.com/app2.tar.gz
- name: app3
sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
url: https://url.com/app3.tar.gz
jobs:
- instances: 1
name: appname
templates:
- name: postgres
release: 1.0
- name: mysql
release: 1.0
This is what I'm getting as a dict
:
{'jobs': [{'instances': 1,
'name': 'appname',
'templates': [{'name': 'postgres', 'release': 1.0}]},
{'templates': [{'name': 'mysql', 'release': 1.0}]}],
'releases': [{'name': 'app1',
'sha1': '11b318d4ec9f0baf75d8afc6f78cf66f955d459f',
'url': 'https://url.com/app.tar.gz'},
{'name': 'app2',
'sha1': 'ef97bfaff05989ab006e88d28763feb8fbb32d45',
'url': 'https://url.com/app2.tar.gz'},
{'name': 'app3',
'sha1': 'ef97bfaff05989ab006e88d28763feb8fbb32d45',
'url': 'https://url.com/app3.tar.gz'}],
'type': [{'location': 'bar', 'name': 'foo'}]}
If you notice my mysql template isn't with the postgres template list but its in another dict
.
Upvotes: 3
Views: 7057
Reputation: 176
I would go with the recursive extension. If you read first one file and then the second one into different dictionaries using PyYaml, you can then try the following:
def extend_dict(extend_me, extend_by):
if isinstance(extend_by, dict):
for k, v in extend_by.iteritems():
if k in extend_me:
extend_dict(extend_me.get(k), v)
else:
extend_me[k] = v
else:
extend_me += extend_by
extend_dict(file1, file2)
The result of the merge will be in file1
dict.
Update:
I've added some stuff. It looks a bit messy but I think you can add the keys you want to be extendable in EXTENDABLE_KEYS
.
Test it with the proper YAML files and let me know if it works as it should.
EXTENDABLE_KEYS = ('templates', )
def extend_dict(extend_me, extend_by):
if isinstance(extend_me, dict):
for k, v in extend_by.iteritems():
if k in extend_me:
extend_dict(extend_me[k], v)
else:
extend_me[k] = v
else:
if isinstance(extend_me, list):
extend_list(extend_me, extend_by)
else:
extend_me += extend_by
def extend_list(extend_me, extend_by):
missing = []
for item1 in extend_me:
if not isinstance(item1, dict):
continue
for item2 in extend_by:
if not isinstance(item2, dict) or item2 in missing:
continue
# Check if any key is an extendable key
if filter(lambda x: x in EXTENDABLE_KEYS, item1.keys()):
extend_dict(item1, item2)
else:
missing += [item2, ]
extend_me += missing
extend_dict(file1, file2)
print yaml.dump(file1, default_flow_style=False)
With that snippet, I obtain the following:
type:
- location: bar
name: foo
releases:
- name: app1
sha1: 11b318d4ec9f0baf75d8afc6f78cf66f955d459f
url: https://url.com/app.tar.gz
- name: app2
sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
url: https://url.com/app2.tar.gz
- name: app3
sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
url: https://url.com/app3.tar.gz
jobs:
- instances: 1
name: appname
templates:
- name: postgres
release: 1.0
- name: mysql
release: 1.0
Upvotes: 3
Reputation: 76588
There are 3 sequences in your example input files and only two of those get "merged" by extending the sequence from the second example to the one from the first. The one that doesn't get merged is the sequence that is the value for the jobs
key.
Because of that you cannot just walk over the data structure loaded from the second example and "merge" any list you encounter and you have to do this explicitly:
import sys
import ruamel.yaml as yaml
def update(l1, l2):
l1.extend(l2[:])
data1 = yaml.round_trip_load(open('1.yaml'))
data2 = yaml.round_trip_load(open('2.yaml'))
update(data1['releases'], data2['releases'])
update(data1['jobs'][0]['templates'], data2['jobs'][0]['templates'])
yaml.round_trip_dump(data1, sys.stdout)
with output:
type:
- name: foo
location: bar
releases:
- name: app1
sha1: 11b318d4ec9f0baf75d8afc6f78cf66f955d459f
url: https://url.com/app.tar.gz
- name: app2
sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
url: https://url.com/app2.tar.gz
- name: app3
sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
url: https://url.com/app3.tar.gz
jobs:
- instances: 1
name: appname
templates:
- name: postgres
release: 1.0
- name: mysql
release: 1.0
Which is not exactly what you expected, because your output example inconsistently indents the sequence that is the value for type
, compared to the value for e.g. releases
.
In the above the update()
can easily be made recursive (on sequence elements and mapping values), but there has to be some criterion to select which sequences to "merge" and which not (i.e. the value for jobs
).
Please note that because of the use of the round_trip_load()
the order of the keys in the mappings is preserved automatically. If the first file would
have any comments, these will be preserved as well. Any end-of-line comments in the second file that are part of the sequences that get merged, is preserved as well.
Upvotes: 1