Mike
Mike

Reputation: 7831

python merge complex YAML

I have some example YAML that looks like this:

type:
  - name: foo
    location: bar
releases:
- name: app1
  sha1: 11b318d4ec9f0baf75d8afc6f78cf66f955d459f
  url: https://url.com/app.tar.gz
- name: app2
  sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
  url: https://url.com/app2.tar.gz
jobs:
- instances: 1
  name: appname
  templates:
  - name: postgres
    release: 1.0

I want to merge in a YAML file that adds to that that might look like

releases:
- name: app3
  sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
  url: https://url.com/app3.tar.gz
jobs:
-
  templates:
  - name: mysql
    release: 1.0

I've tried to convert them to a dict and then merging them together but that didn't work at all.

The end should look like

type:
  - name: foo
    location: bar
releases:
- name: app1
  sha1: 11b318d4ec9f0baf75d8afc6f78cf66f955d459f
  url: https://url.com/app.tar.gz
- name: app2
  sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
  url: https://url.com/app2.tar.gz
- name: app3
  sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
  url: https://url.com/app3.tar.gz
jobs:
- instances: 1
  name: appname
  templates:
  - name: postgres
    release: 1.0
  - name: mysql
    release: 1.0

This is what I'm getting as a dict:

{'jobs': [{'instances': 1,
           'name': 'appname',
           'templates': [{'name': 'postgres', 'release': 1.0}]},
          {'templates': [{'name': 'mysql', 'release': 1.0}]}],
 'releases': [{'name': 'app1',
               'sha1': '11b318d4ec9f0baf75d8afc6f78cf66f955d459f',
               'url': 'https://url.com/app.tar.gz'},
              {'name': 'app2',
               'sha1': 'ef97bfaff05989ab006e88d28763feb8fbb32d45',
               'url': 'https://url.com/app2.tar.gz'},
              {'name': 'app3',
               'sha1': 'ef97bfaff05989ab006e88d28763feb8fbb32d45',
               'url': 'https://url.com/app3.tar.gz'}],
 'type': [{'location': 'bar', 'name': 'foo'}]}

If you notice my mysql template isn't with the postgres template list but its in another dict.

Upvotes: 3

Views: 7057

Answers (2)

Xiflado
Xiflado

Reputation: 176

I would go with the recursive extension. If you read first one file and then the second one into different dictionaries using PyYaml, you can then try the following:

def extend_dict(extend_me, extend_by):
    if isinstance(extend_by, dict):
        for k, v in extend_by.iteritems():
            if k in extend_me:
                extend_dict(extend_me.get(k), v)
            else:
                extend_me[k] = v
    else:
        extend_me += extend_by

extend_dict(file1, file2)

The result of the merge will be in file1 dict.

Update:

I've added some stuff. It looks a bit messy but I think you can add the keys you want to be extendable in EXTENDABLE_KEYS.

Test it with the proper YAML files and let me know if it works as it should.

EXTENDABLE_KEYS = ('templates', )


def extend_dict(extend_me, extend_by):
    if isinstance(extend_me, dict):
        for k, v in extend_by.iteritems():
            if k in extend_me:
                extend_dict(extend_me[k], v)
            else:
                extend_me[k] = v
    else:
        if isinstance(extend_me, list):
            extend_list(extend_me, extend_by)
        else:
            extend_me += extend_by


def extend_list(extend_me, extend_by):
    missing = []
    for item1 in extend_me:
        if not isinstance(item1, dict):
            continue

        for item2 in extend_by:
            if not isinstance(item2, dict) or item2 in missing:
                continue

            # Check if any key is an extendable key
            if filter(lambda x: x in EXTENDABLE_KEYS, item1.keys()):
                extend_dict(item1, item2)
            else:
                missing += [item2, ]
    extend_me += missing


extend_dict(file1, file2)

print yaml.dump(file1, default_flow_style=False)

With that snippet, I obtain the following:

type:
- location: bar
  name: foo
releases:
- name: app1
  sha1: 11b318d4ec9f0baf75d8afc6f78cf66f955d459f
  url: https://url.com/app.tar.gz
- name: app2
  sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
  url: https://url.com/app2.tar.gz
- name: app3
  sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
  url: https://url.com/app3.tar.gz
jobs:
- instances: 1
  name: appname
  templates:
  - name: postgres
    release: 1.0
  - name: mysql
    release: 1.0

Upvotes: 3

Anthon
Anthon

Reputation: 76588

There are 3 sequences in your example input files and only two of those get "merged" by extending the sequence from the second example to the one from the first. The one that doesn't get merged is the sequence that is the value for the jobs key.

Because of that you cannot just walk over the data structure loaded from the second example and "merge" any list you encounter and you have to do this explicitly:

import sys
import ruamel.yaml as yaml

def update(l1, l2):
    l1.extend(l2[:])

data1 = yaml.round_trip_load(open('1.yaml'))
data2 = yaml.round_trip_load(open('2.yaml'))
update(data1['releases'], data2['releases'])
update(data1['jobs'][0]['templates'], data2['jobs'][0]['templates'])

yaml.round_trip_dump(data1, sys.stdout)

with output:

type:
- name: foo
  location: bar
releases:
- name: app1
  sha1: 11b318d4ec9f0baf75d8afc6f78cf66f955d459f
  url: https://url.com/app.tar.gz
- name: app2
  sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
  url: https://url.com/app2.tar.gz
- name: app3
  sha1: ef97bfaff05989ab006e88d28763feb8fbb32d45
  url: https://url.com/app3.tar.gz
jobs:
- instances: 1
  name: appname
  templates:
  - name: postgres
    release: 1.0
  - name: mysql
    release: 1.0

Which is not exactly what you expected, because your output example inconsistently indents the sequence that is the value for type, compared to the value for e.g. releases.

In the above the update() can easily be made recursive (on sequence elements and mapping values), but there has to be some criterion to select which sequences to "merge" and which not (i.e. the value for jobs).

Please note that because of the use of the round_trip_load() the order of the keys in the mappings is preserved automatically. If the first file would have any comments, these will be preserved as well. Any end-of-line comments in the second file that are part of the sequences that get merged, is preserved as well.

Upvotes: 1

Related Questions