Can't merge YAMLs with comments

Question

I'm doing a tool to automate some work and I need to merge some configuration YAMLs in just one, but I need comments because I need to describe the fields to the future.
I already managed to do this without the comments, converting the YAML to JSON, merging and converting to YAML again. I'm willing to use XML or something else, since I can run it in locally. Anyone know anything that can help me?

Like this:

File 1

project:
    general:
        environment: ?

    databases:
        # Main Database
        db1:
            host: localhost
            username: root
            password: root123
            dbname: project
            logFile: ?

File 2:

project:
    general:
        environment: local

    databases:
        db1:
            # New Log File
            logFile: project.log

Would result in this:

project:
    general:
        environment: local

    databases:
        # Main Database
        db1:
            host: localhost
            username: root
            password: root123
            dbname: project
            # New Log File
            logFile: project.log

Anthon · Accepted Answer

As @flyx indicated you should look at the round-trip capabilities of ruamel.yaml (disclaimer: I am the author of that package), even though there is no built-in recursive merge and there are a few caveats.

First of all you should quote your ? values as otherwise you'll get a warning that mapping keys are not allowed (as a plain ? normally introduces an explicitly defined mapping key).

Also important to know is that association of comments in ruamel.yaml tends to be with the last parsed node before the comment. So in your file2.yaml the # New Log File comment is associated with the preceding key db1 and not with the following logFile.

If you are willing to make the input file1.yaml like this:

project:
    general:
        environment: '?'

    databases:
        # Main Database
        db1:
            host: localhost
            username: root
            password: root123
            dbname: project
            logFile: '?'

and file2.yaml like:

project:
    general:
        environment: local

    databases:
        db1:
            logFile: project.log   # New Log File

then this program:

import sys
from pathlib import Path
import ruamel.yaml


def update(d, n):
    if isinstance(n, ruamel.yaml.comments.CommentedMap):
        for k in n:
            d[k] = update(d[k], n[k]) if k in d else n[k]
            if k in n.ca._items and n.ca._items[k][2] and \
               n.ca._items[k][2].value.strip():
                d.ca._items[k] = n.ca._items[k]  # copy non-empty comment
    else:
        d = n
    return d

data1 = ruamel.yaml.round_trip_load(Path('file1.yaml').read_text())
update(data1, ruamel.yaml.round_trip_load(Path('file2.yaml').read_text()))
ruamel.yaml.round_trip_dump(data1, sys.stdout)

is enough to give you the following output:

project:
  general:
    environment: local

  databases:
        # Main Database
    db1:
      host: localhost
      username: root
      password: root123
      dbname: project
      logFile: project.log         # New Log File

Please note that it is not necessary for logFile: '?' to be in file1.txt, as missing keys will be added at the end of the mapping.

If moving the # New Log File to the spot after the key is not acceptable, then you'll have to pre-process the loaded data from file2.yaml, that is not that difficult in this situation. Doing that based e.g. depending on the indentation in your original file2.yaml is possible, but would require quite a few more lines of code to get right and is a bit fragile:

import sys
from pathlib import Path
import ruamel.yaml

INDENT=4

def update(d, n):
    if isinstance(n, ruamel.yaml.comments.CommentedMap):
        for k in n:
            d[k] = update(d[k], n[k]) if k in d else n[k]
            if k in n.ca._items and \
               ((n.ca._items[k][2] and n.ca._items[k][2].value.strip()) or \
                n.ca._items[k][1]):
                d.ca._items[k] = n.ca._items[k]  # copy non-empty comment
    else:
        d = n
    return d


def move_comment(d, depth=0):
    # recursively adjust comment
    if isinstance(d, ruamel.yaml.comments.CommentedMap):
        for k in d:
            if isinstance(d[k], ruamel.yaml.comments.CommentedMap):
                if hasattr(d, 'ca'):
                    comment = d.ca.items.get(k)
                    if comment and comment[3] is not None:
                        # add to first key of the mapping that is the value
                        for k1 in d[k]:
                            d[k].yaml_set_comment_before_after_key(
                                k1,
                                before=comment[3][0].value.lstrip('#').strip(),
                                indent=INDENT*(depth+1))
                            break
            move_comment(d[k], depth+1)
    return d

data1 = ruamel.yaml.round_trip_load(Path('file1.yaml').read_text())
update(data1, move_comment(ruamel.yaml.round_trip_load(Path('file2.yaml').read_text())))
ruamel.yaml.round_trip_dump(data1, sys.stdout, indent=INDENT)

The above gives exactly the output that you asked for with the corrected ('?') file1.yaml and your original file2.yaml.

Can't merge YAMLs with comments

Answers (2)

Related Questions

Can&#39;t merge YAMLs with comments

Answers (2)

Related Questions

Can't merge YAMLs with comments