Reputation: 23
I use ruamel.yaml in order to parse YAML files and I'd like to identify if the key is the anchor itself or just a pointer. Given the following:
foo: &some_anchor
bar: 1
baz: *some_anchor
I'd like to understand that foo
is the actual anchor and baz
is a pointer. From what I can see, there's an anchor
property on the node (and also yaml_anchor
method), but both baz
and foo
show that their anchor is some_anchor
- meaning that I cannot differentiate.
How can I get this info?
Upvotes: 2
Views: 1603
Reputation: 76682
In your example &some_anchor
is the anchor for the single element mapping bar: 1
and
*some_anchor
is the alias. Writing the "foo
is the actual anchor and baz
is pointer`" is
in IMO both incorrect terminology and confusing keys with their (anchored/aliased) values. If you had a YAML document:
- 3
- 5
- 9
- &some_anchor
bar: 1
- 42
- *some_anchor
would you actually say, probably after carefully counting,
that '4
is the anchor and 6
is the pointer(or
3and
5` depending on
where you start counting)?
If you want to test if a key of a dict has a value that was an anchored node in YAML, or if that
value was an aliased node, you'll have to look at the value, and you'll find that they are the same Python data structure
for keys foo
resp. baz
.
What determines on dumping, which key's value gets the anchor and which key's (or keys') value(s) are dumped as an alias,
is entirely determined
by which gets dumped first, as the YAML specification stats that an anchor has to come before its use as an alias (an
anchor can come after an alias if it is re-defined).
As @relent95 describes you should recursively walk over the
data structure you loaded (to see which key gets there first) and in both ruamel.yaml
and PyYAML
look at the id()
.
But for PyYAML that only works for complex data (dict, list, objects) as it throws away anchoring information and will
not find the same id()
on e.g. an anchored integer value.
The alternative to using the id
is to look at the actual anchor name that ruamel.yaml
stores in attribute/property anchor
.
If you know up front that your YAML document is as simple as your example ( anchored/aliased nodes are values for
the root level mapping ) you can do:
import sys
import ruamel.yaml
yaml_str = """\
foo: &some_anchor
bar: 1
baz: *some_anchor
oof: 42
"""
def is_keys_value_anchor(key, data, verbose=0):
anchor_found = set()
for k, v in data.items():
res = None
try:
anchor = v.anchor.value
if anchor is not None:
res = anchor not in anchor_found
anchor_found.add(anchor)
except AttributeError:
pass
if k == key:
break
if verbose > 0:
print(f'key "{key}" {res}')
return res
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
is_keys_value_anchor('foo', data, verbose=1)
is_keys_value_anchor('baz', data, verbose=1)
is_keys_value_anchor('oof', data, verbose=1)
which gives:
key "foo" True
key "baz" False
key "oof" None
But this in ineffecient for root mappings with lots of keys, and won't find anchors/aliases that were nested deeply in the document. A more generic approach is to recursively walk the data structure once and create dict with as key the anchor used, and as value a list of "paths", A path itself being a list of keys/indices with which which you can traverse the data structure starting at the root. The first path in the list being the anchor, the rest aliases:
import sys
import ruamel.yaml
yaml_str = """\
foo: &some_anchor
- bar: 1
- klm: &anchored_num 42
baz:
xyz:
- *some_anchor
oof: [1, 2, c: 13, magic: [*anchored_num]]
"""
def find_anchor_alias_paths(data, path=None, res=None):
def check_add_anchor(d, path, anchors):
# returns False when an alias is found, to prevent recursing into a node twice.
try:
anchor = d.anchor.value
if anchor is not None:
tmp = anchors.setdefault(anchor, [])
tmp.append(path)
return len(tmp) == 1
except AttributeError:
pass
return True
if path is None:
path = []
if res is None:
res = {}
if isinstance(data, dict):
for k, v in data.items():
next_path = path.copy()
next_path.append(k)
if check_add_anchor(v, next_path, res):
find_anchor_alias_paths(v, next_path, res)
elif isinstance(data, list):
for idx, elem in enumerate(data):
next_path = path.copy()
next_path.append(idx)
if check_add_anchor(elem, next_path, res):
find_anchor_alias_paths(elem, next_path, res)
return res
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
anchor_alias_paths = find_anchor_alias_paths(data)
for anchor, paths in anchor_alias_paths.items():
print(f'anchor: "{anchor}", anchor_path: {paths[0]}, alias_path(s): {paths[1:]}')
print('value for last anchor/alias found', data.mlget(paths[-1], list_ok=True))
which gives:
anchor: "some_anchor", anchor_path: ['foo'], alias_path(s): [['baz', 'xyz', 0]]
anchor: "anchored_num", anchor_path: ['foo', 1, 'klm'], alias_path(s): [['oof', 3, 'magic', 0]]
value for last anchor/alias found 42
You can then test your the paths you are interested in against the values returned by find_anchor_alias_paths
,
or the key against the final elements of such paths.
Upvotes: 0
Reputation: 4732
Since PyYaml and Ruamel.yaml load an alias node as a reference of the object loaded from the corresponding anchor node, you can traverse an object tree and check if each node is a reference of a previous visited object or not.
The following is a simple example only checking dictionaries.
from ruamel.yaml import YAML
root = YAML().load('''
foo: &some_anchor
bar: 1
baz: *some_anchor
''')
dict_ids = set()
def visit(parent):
if isinstance(parent, dict):
i = id(parent)
print(parent, ', is_alias:', i in dict_ids)
dict_ids.add(i)
for k, v in parent.items():
visit(v)
elif isinstance(parent, list):
for e in parent:
visit(e)
visit(root)
This will output the following.
ordereddict([('foo', ordereddict([('bar', 1)])), ('baz', ordereddict([('bar', 1)]))]) , is_alias: False
ordereddict([('bar', 1)]) , is_alias: False
ordereddict([('bar', 1)]) , is_alias: True
Upvotes: 1