Reputation: 81
I'd like to be able to generate anchors in the YAML generated by PyYAML's dump() function. Is there a way to do this? Ideally the anchors would have the same name as the YAML nodes.
Example:
import yaml
yaml.dump({'a': [1,2,3]})
'a: [1, 2, 3]\n'
What I'd like to be able to do is generate YAML like:
import yaml
yaml.dump({'a': [1,2,3]})
'a: &a [1, 2, 3]\n'
Can I write a custom emitter or dumper to do this? Is there another way?
Upvotes: 8
Views: 7914
Reputation: 140
This is not so easy. Unless the data that you want to use for the anchor is inside the node. This is because the anchor gets attached to the node contents, in your example '[1,2,3]' and doesn't know that this value is associated with key 'a'.
l = [1, 2, 3]
foo = {'a': l, 'b': l}
class SpecialAnchor(yaml.Dumper):
def generate_anchor(self, node):
print('Generating anchor for {}'.format(str(node)))
anchor = super().generate_anchor(node)
print('Generated "{}"'.format(anchor))
return anchor
y1 = yaml.dump(foo, Dumper=Anchor)
Gives you:
Generating anchor for SequenceNode(
tag='tag:yaml.org,2002:seq', value=
[ScalarNode(tag='tag:yaml.org,2002:int', value='1'),
ScalarNode(tag='tag:yaml.org,2002:int', value='2'),
ScalarNode(tag='tag:yaml.org,2002:int', value='3')]
)
Generated "id001"
a: &id001 [1, 2, 3]
b: *id001
So far I haven't found a way to get the key 'a' given the node...
Upvotes: 3
Reputation: 12083
I wrote a custom anchor class to force an anchor value for top level nodes. It does not simply override the anchor string (using generate_anchor), but actually forces the Anchor to be emitted, even if the node is not referenced later:
class CustomAnchor(yaml.Dumper):
def __init__(self, *args, **kwargs):
super(CustomAnchor, self).__init__(*args, **kwargs)
self.depth = 0
self.basekey = None
self.newanchors = {}
def anchor_node(self, node):
self.depth += 1
if self.depth == 2:
assert isinstance(node, yaml.ScalarNode), "yaml node not a string: %s" % node
self.basekey = str(node.value)
node.value = self.basekey + "_ALIAS"
if self.depth == 3:
assert self.basekey, "could not find base key for value: %s" % node
self.newanchors[node] = self.basekey
super(CustomAnchor, self).anchor_node(node)
if self.newanchors:
self.anchors.update(self.newanchors)
self.newanchors.clear()
Note that I override the node name to be suffixed with "_ALIAS", but you could strip that line to leave the node name and anchor name the same, or change it to something else.
E.g. dumping {'FOO': 'BAR'} results in:
FOO_ALIAS: &FOO BAR
Also, I only wrote it to deal with single top level key/value pairs at a time, and it will only force an anchor for the top level key. If you want to turn a dict into a YAML file with all the keys being top level YAML nodes, you will need to iterate over the dict and dump each key/value pair as {key:value}, or rewrite this class to handle a dict with multiple keys.
Upvotes: 2
Reputation: 3215
I couldn't get @beeb's answer to run at all, so I went ahead and tried to generailize @aaa90210's answer
import yaml
class _CustomAnchor(yaml.Dumper):
anchor_tags = {}
def __init__(self,*args,**kwargs):
super().__init__(*args,**kwargs)
self.new_anchors = {}
self.anchor_next = None
def anchor_node(self, node):
if self.anchor_next is not None:
self.new_anchors[node] = self.anchor_next
self.anchor_next = None
if isinstance(node.value, str) and node.value in self.anchor_tags:
self.anchor_next = self.anchor_tags[node.value]
super().anchor_node(node)
if self.new_anchors:
self.anchors.update(self.new_anchors)
self.new_anchors.clear()
def CustomAnchor(tags):
return type('CustomAnchor', (_CustomAnchor,), {'anchor_tags': tags})
print(yaml.dump(foo, Dumper=CustomAnchor({'a': 'a_name'})))
This does not offer a way to differentiate between two nodes with the same name value, that would require a yaml equivalent of XML's xpath, which I do not see in pyyaml :(
The Class factory CustomAnchor
lets you pass in a dictionary of anchors base on node values. {value: anchor_name}
Upvotes: 1
Reputation: 1217
This question is quite old and there's already some good pointers by aaa90210 in his answer, but the provided class was not really doing what I wanted and I think it doesn't generalize well.
I tried to come up with a dumper that would allow to add anchors and make sure corresponding aliases are created if the keys comes up again later in the file.
By no means is this fully featured and it can probably be made safer, but I hope it can be of inspiration to others:
import yaml
from typing import Dict
class CustomAnchor(yaml.Dumper):
"""Customer Dumper class to create anchors for keys throughout the YAML file.
Attributes:
added_anchors: mapping of key names to the node objects representing their value, for nodes that have an anchor
"""
def __init__(self, *args, **kwargs):
"""Initialize class.
We call the constructor of the parent class.
"""
super().__init__(*args, **kwargs)
self.filter_keys = ['a', 'b']
self.added_anchors: Dict[str, yaml.ScalarNode] = {}
def anchor_node(self, node):
"""Override method from parent class.
This method first checks if the node contains the keys of interest, and if anchors already exist for these keys,
replaces the reference to the value node to the one that the anchor points to. In case no anchor exist for
those keys, it creates them and keeps a reference to the value node in the ``added_anchors`` class attribute.
Args:
node (yaml.Node): the node being processed by the dumper
"""
if isinstance(node, yaml.MappingNode):
# let's check through the mapping to find keys which are of interest
for i, (key_node, value_node) in enumerate(node.value):
if (
isinstance(key_node, yaml.ScalarNode)
and key_node.value in self.filter_keys
):
if key_node.value in self.added_anchors: # anchor exists
# replace value node to tell the dumper to create an alias
node.value[i] = (key_node, self.added_anchors[key_node.value])
else: # no anchor yet exists but we need to create one
self.anchors.update({value_node: key_node.value})
self.added_anchors[key_node.value] = value_node
super().anchor_node(node)
Upvotes: 0
Reputation: 85
By default, anchors are only emitted when it detects a reference to an object previously seen:
>>> import yaml
>>>
>>> foo = {'a': [1,2,3]}
>>> doc = (foo,foo)
>>>
>>> print yaml.safe_dump(doc, default_flow_style=False)
- &id001
a:
- 1
- 2
- 3
- *id001
If you want to override how it is named, you'll have to customize the Dumper class, specifically the generate_anchor()
function. ANCHOR_TEMPLATE
may also be useful.
In your example, the node name is simple, but you need to take into account the many possibilities for YAML values, ie it could be a sequence rather than a single value:
>>> import yaml
>>>
>>> foo = {('a', 'b', 'c'): [1,2,3]}
>>> doc = (foo,foo)
>>>
>>> print yaml.dump(doc, default_flow_style=False)
!!python/tuple
- &id001
? !!python/tuple
- a
- b
- c
: - 1
- 2
- 3
- *id001
Upvotes: 7