Avl8
Avl8

Reputation: 23

Identify if yaml key is anchor or pointer

I use ruamel.yaml in order to parse YAML files and I'd like to identify if the key is the anchor itself or just a pointer. Given the following:

foo: &some_anchor
  bar: 1

baz: *some_anchor

I'd like to understand that foo is the actual anchor and baz is a pointer. From what I can see, there's an anchor property on the node (and also yaml_anchor method), but both baz and foo show that their anchor is some_anchor - meaning that I cannot differentiate.

How can I get this info?

Upvotes: 2

Views: 1603

Answers (2)

Anthon
Anthon

Reputation: 76682

In your example &some_anchor is the anchor for the single element mapping bar: 1 and *some_anchor is the alias. Writing the "foo is the actual anchor and baz is pointer`" is in IMO both incorrect terminology and confusing keys with their (anchored/aliased) values. If you had a YAML document:

- 3
- 5
- 9
- &some_anchor
  bar: 1
- 42
- *some_anchor

would you actually say, probably after carefully counting, that '4 is the anchor and 6 is the pointer(or3and5` depending on where you start counting)?

If you want to test if a key of a dict has a value that was an anchored node in YAML, or if that value was an aliased node, you'll have to look at the value, and you'll find that they are the same Python data structure for keys foo resp. baz. What determines on dumping, which key's value gets the anchor and which key's (or keys') value(s) are dumped as an alias, is entirely determined by which gets dumped first, as the YAML specification stats that an anchor has to come before its use as an alias (an anchor can come after an alias if it is re-defined).

As @relent95 describes you should recursively walk over the data structure you loaded (to see which key gets there first) and in both ruamel.yaml and PyYAML look at the id(). But for PyYAML that only works for complex data (dict, list, objects) as it throws away anchoring information and will not find the same id() on e.g. an anchored integer value.

The alternative to using the id is to look at the actual anchor name that ruamel.yaml stores in attribute/property anchor. If you know up front that your YAML document is as simple as your example ( anchored/aliased nodes are values for the root level mapping ) you can do:

import sys
import ruamel.yaml

yaml_str = """\
foo: &some_anchor
  bar: 1

baz: *some_anchor
oof: 42
"""

def is_keys_value_anchor(key, data, verbose=0):
    anchor_found = set()
    for k, v in data.items():
        res = None
        try:
            anchor = v.anchor.value
            if anchor is not None:
                res = anchor not in anchor_found
                anchor_found.add(anchor)
        except AttributeError:
            pass
        if k == key:
            break
    if verbose > 0:
        print(f'key "{key}" {res}')
    return res
    
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
is_keys_value_anchor('foo', data, verbose=1)
is_keys_value_anchor('baz', data, verbose=1)
is_keys_value_anchor('oof', data, verbose=1)

which gives:

key "foo" True
key "baz" False
key "oof" None

But this in ineffecient for root mappings with lots of keys, and won't find anchors/aliases that were nested deeply in the document. A more generic approach is to recursively walk the data structure once and create dict with as key the anchor used, and as value a list of "paths", A path itself being a list of keys/indices with which which you can traverse the data structure starting at the root. The first path in the list being the anchor, the rest aliases:

import sys
import ruamel.yaml

yaml_str = """\
foo: &some_anchor
  - bar: 1
  - klm: &anchored_num 42

baz:
    xyz:
    - *some_anchor
oof: [1, 2, c: 13, magic: [*anchored_num]]
"""

def find_anchor_alias_paths(data, path=None, res=None):
    def check_add_anchor(d, path, anchors):
        # returns False when an alias is found, to prevent recursing into a node twice.
        try:
            anchor = d.anchor.value
            if anchor is not None:
                tmp = anchors.setdefault(anchor, [])
                tmp.append(path)
                return len(tmp) == 1
        except AttributeError:
            pass
        return True

    if path is None:
        path = []
    if res is None:
        res = {}
    if isinstance(data, dict):
        for k, v in data.items():
            next_path = path.copy()
            next_path.append(k)
            if check_add_anchor(v, next_path, res):
                find_anchor_alias_paths(v, next_path, res)
    elif isinstance(data, list):
        for idx, elem in enumerate(data):
            next_path = path.copy()
            next_path.append(idx)
            if check_add_anchor(elem, next_path, res):
                find_anchor_alias_paths(elem, next_path, res)
    return res


yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
anchor_alias_paths = find_anchor_alias_paths(data)
for anchor, paths in anchor_alias_paths.items():
    print(f'anchor: "{anchor}", anchor_path: {paths[0]}, alias_path(s): {paths[1:]}')
print('value for last anchor/alias found', data.mlget(paths[-1], list_ok=True))

which gives:

anchor: "some_anchor", anchor_path: ['foo'], alias_path(s): [['baz', 'xyz', 0]]
anchor: "anchored_num", anchor_path: ['foo', 1, 'klm'], alias_path(s): [['oof', 3, 'magic', 0]]
value for last anchor/alias found 42

You can then test your the paths you are interested in against the values returned by find_anchor_alias_paths, or the key against the final elements of such paths.

Upvotes: 0

relent95
relent95

Reputation: 4732

Since PyYaml and Ruamel.yaml load an alias node as a reference of the object loaded from the corresponding anchor node, you can traverse an object tree and check if each node is a reference of a previous visited object or not.

The following is a simple example only checking dictionaries.

from ruamel.yaml import YAML

root = YAML().load('''
foo: &some_anchor
  bar: 1

baz: *some_anchor
''')
dict_ids = set()
def visit(parent):
    if isinstance(parent, dict):
        i = id(parent)
        print(parent, ', is_alias:', i in dict_ids)
        dict_ids.add(i)
        for k, v in parent.items():
            visit(v)
    elif isinstance(parent, list):
        for e in parent:
            visit(e)
visit(root)

This will output the following.

ordereddict([('foo', ordereddict([('bar', 1)])), ('baz', ordereddict([('bar', 1)]))]) , is_alias: False
ordereddict([('bar', 1)]) , is_alias: False
ordereddict([('bar', 1)]) , is_alias: True

Upvotes: 1

Related Questions