CrazyChucky
CrazyChucky

Reputation: 3518

Copy aliases to separate Python objects when loading YAML with ruamel.yaml

When loading YAML data using ruamel.yaml, anchors and their aliases in the YAML file are the same object in Python:

from ruamel.yaml import YAML

yaml_str = """\
first: &reference [1, 2, 3]
second: *reference
"""

yaml = YAML()
data = yaml.load(yaml_str)

assert(data['first'] is data['second'])
# passes

data['first'].append(4)
print(data['second'])
# output: [1, 2, 3, 4]

I realize this is an intentional feature. However, is there a way to tell load to instead copy aliases when it finds them? I tried overriding yaml.representer.ignore_aliases as mentioned in this answer, but that's only for writing to YAML, not reading from it.

Upvotes: 2

Views: 569

Answers (1)

Anthon
Anthon

Reputation: 76614

There is no built-in functionality to do what you want. Any time you encounter an alias, you would have to create a node structure in the composer recursively instead of just returning the anchor node for the alias.

The need for anchors and aliases in YAML documents arises from the need to be able to represent recursive data structures:

a = [1, 2]
a.append(a)

The above structure cannot be dumped using the technique show in the answer you link to, and neither could the representation as a YAML document without expansion:

&id001
- 1
- 2
- *id001

be expanded during loading using the technique suggested in the first paragraph.

Your (non-recursive) example can be dumped with the technique in the linked answer, and it could also be expanded while being loaded.

The simplest solution, unless you want to dive in the bowels of ruamel.yaml, is to load your YAML document, dump with expansion, and then load the result of the dump:

import sys
import ruamel.yaml

yaml_str = """\
first: &reference [1, 2, 3]
second: *reference
"""

yaml = ruamel.yaml.YAML()
yaml.representer.ignore_aliases = lambda *data: True
data = yaml.load(yaml_str)
buf = ruamel.yaml.compat.BytesIO()
out = yaml.dump(data, buf)
data = yaml.load(buf.getvalue())


assert(data['first'] is not data['second'])
assert(data['first'] == data['second'])

data['first'].append(4)
assert len(data['first']) == 4
assert len(data['second']) == 3

Of course this is not very efficient when you have a huge file.

Upvotes: 1

Related Questions