Reputation: 3518
When loading YAML data using ruamel.yaml, anchors and their aliases in the YAML file are the same object in Python:
from ruamel.yaml import YAML
yaml_str = """\
first: &reference [1, 2, 3]
second: *reference
"""
yaml = YAML()
data = yaml.load(yaml_str)
assert(data['first'] is data['second'])
# passes
data['first'].append(4)
print(data['second'])
# output: [1, 2, 3, 4]
I realize this is an intentional feature. However, is there a way to tell load
to instead copy aliases when it finds them? I tried overriding yaml.representer.ignore_aliases
as mentioned in this answer, but that's only for writing to YAML, not reading from it.
Upvotes: 2
Views: 569
Reputation: 76614
There is no built-in functionality to do what you want. Any time you encounter an alias, you would have to create a node structure in the composer recursively instead of just returning the anchor node for the alias.
The need for anchors and aliases in YAML documents arises from the need to be able to represent recursive data structures:
a = [1, 2]
a.append(a)
The above structure cannot be dumped using the technique show in the answer you link to, and neither could the representation as a YAML document without expansion:
&id001
- 1
- 2
- *id001
be expanded during loading using the technique suggested in the first paragraph.
Your (non-recursive) example can be dumped with the technique in the linked answer, and it could also be expanded while being loaded.
The simplest solution, unless you want to dive in the bowels of ruamel.yaml, is to load your YAML document, dump with expansion, and then load the result of the dump:
import sys
import ruamel.yaml
yaml_str = """\
first: &reference [1, 2, 3]
second: *reference
"""
yaml = ruamel.yaml.YAML()
yaml.representer.ignore_aliases = lambda *data: True
data = yaml.load(yaml_str)
buf = ruamel.yaml.compat.BytesIO()
out = yaml.dump(data, buf)
data = yaml.load(buf.getvalue())
assert(data['first'] is not data['second'])
assert(data['first'] == data['second'])
data['first'].append(4)
assert len(data['first']) == 4
assert len(data['second']) == 3
Of course this is not very efficient when you have a huge file.
Upvotes: 1