Reputation: 109
I'm doing a tool to automate some work and I need to merge some configuration YAMLs in just one, but I need comments because I need to describe the fields to the future.
I already managed to do this without the comments, converting the YAML to JSON, merging and converting to YAML again. I'm willing to use XML or something else, since I can run it in locally. Anyone know anything that can help me?
Like this:
File 1
project:
general:
environment: ?
databases:
# Main Database
db1:
host: localhost
username: root
password: root123
dbname: project
logFile: ?
File 2:
project:
general:
environment: local
databases:
db1:
# New Log File
logFile: project.log
Would result in this:
project:
general:
environment: local
databases:
# Main Database
db1:
host: localhost
username: root
password: root123
dbname: project
# New Log File
logFile: project.log
Upvotes: 2
Views: 666
Reputation: 76632
As @flyx indicated you should look at the round-trip capabilities of ruamel.yaml
(disclaimer: I am the author of that package), even though there is no built-in recursive merge and there are a few caveats.
First of all you should quote your ?
values as otherwise you'll get a warning that mapping keys are not allowed (as a plain ?
normally introduces an explicitly defined mapping key).
Also important to know is that association of comments in ruamel.yaml
tends to be with the last parsed node before the comment. So in your file2.yaml
the # New Log File
comment is associated with the preceding key db1
and not with the following logFile
.
If you are willing to make the input file1.yaml
like this:
project:
general:
environment: '?'
databases:
# Main Database
db1:
host: localhost
username: root
password: root123
dbname: project
logFile: '?'
and file2.yaml
like:
project:
general:
environment: local
databases:
db1:
logFile: project.log # New Log File
then this program:
import sys
from pathlib import Path
import ruamel.yaml
def update(d, n):
if isinstance(n, ruamel.yaml.comments.CommentedMap):
for k in n:
d[k] = update(d[k], n[k]) if k in d else n[k]
if k in n.ca._items and n.ca._items[k][2] and \
n.ca._items[k][2].value.strip():
d.ca._items[k] = n.ca._items[k] # copy non-empty comment
else:
d = n
return d
data1 = ruamel.yaml.round_trip_load(Path('file1.yaml').read_text())
update(data1, ruamel.yaml.round_trip_load(Path('file2.yaml').read_text()))
ruamel.yaml.round_trip_dump(data1, sys.stdout)
is enough to give you the following output:
project:
general:
environment: local
databases:
# Main Database
db1:
host: localhost
username: root
password: root123
dbname: project
logFile: project.log # New Log File
Please note that it is not necessary for logFile: '?'
to be in file1.txt
, as missing keys will be added at the end of the mapping.
If moving the # New Log File
to the spot after the key is not acceptable, then you'll have to pre-process the loaded data from file2.yaml
, that is not that difficult in this situation. Doing that based e.g. depending on the indentation in your original file2.yaml
is possible, but would require quite a few more lines of code to get right and is a bit fragile:
import sys
from pathlib import Path
import ruamel.yaml
INDENT=4
def update(d, n):
if isinstance(n, ruamel.yaml.comments.CommentedMap):
for k in n:
d[k] = update(d[k], n[k]) if k in d else n[k]
if k in n.ca._items and \
((n.ca._items[k][2] and n.ca._items[k][2].value.strip()) or \
n.ca._items[k][1]):
d.ca._items[k] = n.ca._items[k] # copy non-empty comment
else:
d = n
return d
def move_comment(d, depth=0):
# recursively adjust comment
if isinstance(d, ruamel.yaml.comments.CommentedMap):
for k in d:
if isinstance(d[k], ruamel.yaml.comments.CommentedMap):
if hasattr(d, 'ca'):
comment = d.ca.items.get(k)
if comment and comment[3] is not None:
# add to first key of the mapping that is the value
for k1 in d[k]:
d[k].yaml_set_comment_before_after_key(
k1,
before=comment[3][0].value.lstrip('#').strip(),
indent=INDENT*(depth+1))
break
move_comment(d[k], depth+1)
return d
data1 = ruamel.yaml.round_trip_load(Path('file1.yaml').read_text())
update(data1, move_comment(ruamel.yaml.round_trip_load(Path('file2.yaml').read_text())))
ruamel.yaml.round_trip_dump(data1, sys.stdout, indent=INDENT)
The above gives exactly the output that you asked for with the corrected ('?'
) file1.yaml
and your original file2.yaml
.
Upvotes: 1
Reputation: 39708
You cannot do this with normal YAML implementations because YAML defines that comments are a presentation detail and must not convey content information. Thus, as soon as you parse YAML, you will automatically lose comment information.
There is ruamel which provides ruamel.yaml.round_trip_load()
. This gives you a CommentedMap
(if your YAML has a mapping as root type) which preserves all comments. You can merge such maps element-wise and then output them as YAML again.
Depending on the layout of your YAMLs, you may also be able to succeed in merging them on a textual basis. For examples, for two YAML files like this:
first.yaml:
foo: bar
spam: egg
second:yaml:
sausage: spam
baked: beans
You can merge them like this by simply adding indentation to each line and concatenating them:
first:
foo: bar
spam: egg
second:
sausage: spam
baked: beans
You'd just iterate over the lines and prepend the indentation. This will work for any well-formed input YAML files as long as they don't have explicit directive or document end markers in them (---
or ...
).
If you want to merge the YAML files on the same level, you can still try and concatenate them, this works fine with my example:
foo: bar
spam: egg
sausage: spam
baked: beans
You can also concatenate them using into a YAML file with multiple documents, although I am not sure whether this is what you want:
foo: bar
spam: egg
...
---
sausage: spam
baked: beans
This is guaranteed to work as per YAML specification.
Upvotes: 1