Geoff Lawler
Geoff Lawler

Reputation: 81

Generating anchors with PyYAML.dump()?

I'd like to be able to generate anchors in the YAML generated by PyYAML's dump() function. Is there a way to do this? Ideally the anchors would have the same name as the YAML nodes.

Example:

import yaml
yaml.dump({'a': [1,2,3]})
'a: [1, 2, 3]\n'

What I'd like to be able to do is generate YAML like:

import yaml
yaml.dump({'a': [1,2,3]})
'a: &a [1, 2, 3]\n'

Can I write a custom emitter or dumper to do this? Is there another way?

Upvotes: 8

Views: 7914

Answers (5)

Eliot Blennerhassett
Eliot Blennerhassett

Reputation: 140

This is not so easy. Unless the data that you want to use for the anchor is inside the node. This is because the anchor gets attached to the node contents, in your example '[1,2,3]' and doesn't know that this value is associated with key 'a'.

l = [1, 2, 3]
foo = {'a': l, 'b': l}
class SpecialAnchor(yaml.Dumper):

    def generate_anchor(self, node):
        print('Generating anchor for {}'.format(str(node)))
        anchor =  super().generate_anchor(node)
        print('Generated "{}"'.format(anchor))
        return anchor

y1 = yaml.dump(foo, Dumper=Anchor)

Gives you:

Generating anchor for SequenceNode(
    tag='tag:yaml.org,2002:seq', value= 
        [ScalarNode(tag='tag:yaml.org,2002:int', value='1'), 
         ScalarNode(tag='tag:yaml.org,2002:int', value='2'), 
         ScalarNode(tag='tag:yaml.org,2002:int', value='3')]
    )
Generated "id001"
a: &id001 [1, 2, 3]
b: *id001

So far I haven't found a way to get the key 'a' given the node...

Upvotes: 3

aaa90210
aaa90210

Reputation: 12083

I wrote a custom anchor class to force an anchor value for top level nodes. It does not simply override the anchor string (using generate_anchor), but actually forces the Anchor to be emitted, even if the node is not referenced later:

class CustomAnchor(yaml.Dumper):
    def __init__(self, *args, **kwargs):
        super(CustomAnchor, self).__init__(*args, **kwargs)
        self.depth = 0
        self.basekey = None
        self.newanchors = {}

    def anchor_node(self, node):
        self.depth += 1
        if self.depth == 2:
            assert isinstance(node, yaml.ScalarNode), "yaml node not a string: %s" % node
            self.basekey = str(node.value)
            node.value = self.basekey + "_ALIAS"
        if self.depth == 3:
            assert self.basekey, "could not find base key for value: %s" % node
            self.newanchors[node] = self.basekey
        super(CustomAnchor, self).anchor_node(node)
        if self.newanchors:
            self.anchors.update(self.newanchors)
            self.newanchors.clear()

Note that I override the node name to be suffixed with "_ALIAS", but you could strip that line to leave the node name and anchor name the same, or change it to something else.

E.g. dumping {'FOO': 'BAR'} results in:

FOO_ALIAS: &FOO BAR

Also, I only wrote it to deal with single top level key/value pairs at a time, and it will only force an anchor for the top level key. If you want to turn a dict into a YAML file with all the keys being top level YAML nodes, you will need to iterate over the dict and dump each key/value pair as {key:value}, or rewrite this class to handle a dict with multiple keys.

Upvotes: 2

Andy
Andy

Reputation: 3215

I couldn't get @beeb's answer to run at all, so I went ahead and tried to generailize @aaa90210's answer

import yaml

class _CustomAnchor(yaml.Dumper):
  anchor_tags = {}
  def __init__(self,*args,**kwargs):
    super().__init__(*args,**kwargs)
    self.new_anchors = {}
    self.anchor_next = None
  def anchor_node(self, node):
    if self.anchor_next is not None:
      self.new_anchors[node] = self.anchor_next
      self.anchor_next = None
    if isinstance(node.value, str) and node.value in self.anchor_tags:
      self.anchor_next = self.anchor_tags[node.value]

    super().anchor_node(node)

    if self.new_anchors:
      self.anchors.update(self.new_anchors)
      self.new_anchors.clear()
def CustomAnchor(tags):
  return type('CustomAnchor', (_CustomAnchor,), {'anchor_tags': tags})

print(yaml.dump(foo, Dumper=CustomAnchor({'a': 'a_name'})))

This does not offer a way to differentiate between two nodes with the same name value, that would require a yaml equivalent of XML's xpath, which I do not see in pyyaml :(


The Class factory CustomAnchor lets you pass in a dictionary of anchors base on node values. {value: anchor_name}

Upvotes: 1

beeb
beeb

Reputation: 1217

This question is quite old and there's already some good pointers by aaa90210 in his answer, but the provided class was not really doing what I wanted and I think it doesn't generalize well.

I tried to come up with a dumper that would allow to add anchors and make sure corresponding aliases are created if the keys comes up again later in the file.

By no means is this fully featured and it can probably be made safer, but I hope it can be of inspiration to others:

import yaml
from typing import Dict


class CustomAnchor(yaml.Dumper):
    """Customer Dumper class to create anchors for keys throughout the YAML file.

    Attributes:
        added_anchors: mapping of key names to the node objects representing their value, for nodes that have an anchor
    """

    def __init__(self, *args, **kwargs):
        """Initialize class.

        We call the constructor of the parent class.
        """
        super().__init__(*args, **kwargs)
        self.filter_keys = ['a', 'b']
        self.added_anchors: Dict[str, yaml.ScalarNode] = {}

    def anchor_node(self, node):
        """Override method from parent class.

        This method first checks if the node contains the keys of interest, and if anchors already exist for these keys,
        replaces the reference to the value node to the one that the anchor points to. In case no anchor exist for
        those keys, it creates them and keeps a reference to the value node in the ``added_anchors`` class attribute.

        Args:
            node (yaml.Node): the node being processed by the dumper
        """
        if isinstance(node, yaml.MappingNode):
            # let's check through the mapping to find keys which are of interest
            for i, (key_node, value_node) in enumerate(node.value):
                if (
                    isinstance(key_node, yaml.ScalarNode)
                    and key_node.value in self.filter_keys
                ):
                    if key_node.value in self.added_anchors:  # anchor exists
                        # replace value node to tell the dumper to create an alias
                        node.value[i] = (key_node, self.added_anchors[key_node.value])
                    else:  # no anchor yet exists but we need to create one
                        self.anchors.update({value_node: key_node.value})
                        self.added_anchors[key_node.value] = value_node
        super().anchor_node(node)

Upvotes: 0

AlexH
AlexH

Reputation: 85

By default, anchors are only emitted when it detects a reference to an object previously seen:

>>> import yaml
>>>
>>> foo = {'a': [1,2,3]}
>>> doc = (foo,foo)
>>>
>>> print yaml.safe_dump(doc, default_flow_style=False)
- &id001
  a:
  - 1
  - 2
  - 3
- *id001

If you want to override how it is named, you'll have to customize the Dumper class, specifically the generate_anchor() function. ANCHOR_TEMPLATE may also be useful.

In your example, the node name is simple, but you need to take into account the many possibilities for YAML values, ie it could be a sequence rather than a single value:

>>> import yaml
>>>
>>> foo = {('a', 'b', 'c'): [1,2,3]}
>>> doc = (foo,foo)
>>>
>>> print yaml.dump(doc, default_flow_style=False)
!!python/tuple
- &id001
  ? !!python/tuple
  - a
  - b
  - c
  : - 1
    - 2
    - 3
- *id001

Upvotes: 7

Related Questions