rbatt
rbatt

Reputation: 4807

YAML mapping order not preserved when using alias and yamlordereddictloader loader

I want to load a YAML file into Python as an OrderedDict. I am using yamlordereddictloader to preserve ordering.

However, I notice that the aliased object is placed "too soon" in the OrderedDict in the output.

How can I preserve the order of this mapping when read into Python, ideally as an OrderedDict? Is it possible to achieve this result without writing some custom parsing?

Notes:

import yaml
import yamlordereddictloader

yaml_file = """
d1:
  id:
    nm1: val1
  dt: &dt
    nm2: val2
    nm3: val3

d2: # expect nm4, nm2, nm3
  nm4: val4
  <<: *dt
"""

out = yaml.load(yaml_file, Loader=yamlordereddictloader.Loader)
keys = [x for x in out['d2']]
print(keys) # ['nm2', 'nm3', 'nm4']
assert keys==['nm4', 'nm2', 'nm3'], "order from YAML file is not preserved, aliased keys placed too early"

Upvotes: 1

Views: 313

Answers (1)

flyx
flyx

Reputation: 39768

Is it possible to achieve this result without writing some custom parsing?

Yes. You need to override the method flatten_mapping from SafeConstructor. Here's a basic working example:

import yaml
import yamlordereddictloader
from yaml.constructor import *
from yaml.reader import *
from yaml.parser import *
from yaml.resolver import *
from yaml.composer import *
from yaml.scanner import *
from yaml.nodes import *

class MyLoader(yamlordereddictloader.Loader):
  def __init__(self, stream):
    yamlordereddictloader.Loader.__init__(self, stream)
    
  # taken from here and reengineered to keep order:
  # https://github.com/yaml/pyyaml/blob/5.3.1/lib/yaml/constructor.py#L207
  def flatten_mapping(self, node):
    merged = []
    def merge_from(node):
      if not isinstance(node, MappingNode):
        raise yaml.ConstructorError("while constructing a mapping",
            node.start_mark, "expected mapping for merging, but found %s" %
            node.id, node.start_mark)
      self.flatten_mapping(node)
      merged.extend(node.value)
    for index in range(len(node.value)):
      key_node, value_node = node.value[index]
      if key_node.tag == u'tag:yaml.org,2002:merge':
        if isinstance(value_node, SequenceNode):
           for subnode in value_node.value:
             merge_from(subnode)
        else:
          merge_from(value_node)
      else:
       if key_node.tag == u'tag:yaml.org,2002:value':
         key_node.tag = u'tag:yaml.org,2002:str'
       merged.append((key_node, value_node))
    node.value = merged

yaml_file = """
d1:
  id:
    nm1: val1
  dt: &dt
    nm2: val2
    nm3: val3

d2: # expect nm4, nm2, nm3
  nm4: val4
  <<: *dt
"""

out = yaml.load(yaml_file, Loader=MyLoader)
keys = [x for x in out['d2']]
print(keys)
assert keys==['nm4', 'nm2', 'nm3'], "order from YAML file is not preserved, aliased keys placed too early"

This has not the best performance as it basically copies all key-value pairs from all mappings once each during loading, but it's working. Performance enhancement is left as an exercise for the reader :).

Upvotes: 1

Related Questions