David Moreno García
David Moreno García

Reputation: 4523

Load YAML preserving order

I have a Python library that defines a list like the next one:

config = [
    'task_a',
    ('task_b', {'task_b_opt_1': ' '}),
    ('task_a', {'task_a_opt_1': 42}),
    ('task_c', {'task_c_opt_1': 'foo', 'task_c_opt_2': 'bar'}),
    'task_b'
]

Basically this list defines 5 tasks that have to be applied in that specific order and using the parameters defined (if any). Also, the same task can define parameters or not (use default values).

Now I want to extend the library to support config files. To make them easier for the final user I was thinking in using YAML files. So the code above would become something like:

task_a:
task_b:
  task_b_opt_1: ' '
task_a:
  task_a_opt_1: 42
task_c:
  task_c_opt_1': 'foo'
  task_c_opt_2': 'bar'
task_b:

This is not even a valid YAML file as some keys have no value. So I have two questions:

  1. How can I define empty tasks?
  2. How can I preserve the order when loading the file in Python?

If none of those is possible, is there any other solution for this?

Upvotes: 6

Views: 6450

Answers (3)

Daniel H
Daniel H

Reputation: 7463

In YAML, a mapping is defined to not be ordered. The typical solution is to make it a list of mappings. However, the values (or even keys) can be missing, in which case they are implicitly null (the equivalent of None in Python)

- task_a:
- task_b:
    task_b_opt_1: ' '
- task_a:
    task_a_opt_1: 42
- task_c:
    task_c_opt_1: 'foo'
    task_c_opt_2: 'bar'
- task_b:

Another option is to not make the tasks without options into mappings, and instead use strings, by just removing the : from those lines:

- task_a
- task_b:
    task_b_opt_1: ' '
- task_a:
    task_a_opt_1: 42
- task_c:
    task_c_opt_1: 'foo'
    task_c_opt_2: 'bar'
- task_b

Upvotes: 4

Anthon
Anthon

Reputation: 76802

I might be reading between the lines, but I assume that your string 'task_a', 'task_b', etc., each causes an object of a specific type (class) to be created. You can directly specify those object types using YAML tags resulting in the following YAML document:

- !task_a
- !task_b
  task_b_opt_1: ' '
- !task_a
  task_a_opt_1: 42
- !task_c
  task_c_opt_1: foo
  task_c_opt_2: bar
- !task_b

If your task_X_opt_N are actually positional arguments you can use:

- !task_a
- !task_b
  - ' '
- !task_a
  - 42
- !task_c
  - foo
  - bar
- !task_b

which is IMO more readable (and less error prone when final users edit these).

Either of these formats can be loaded by:

import ruamel.yaml

class Task:
    def __init__(self, *args, **kw):
        if args: assert len(kw) == 0
        if kw: assert len(args) == 0
        self.args = args
        self.opt = kw

    def __repr__(self):
        retval = str(self.__class__.__name__)
        task_letter = retval[-1].lower()
        for idx, k in enumerate(self.args):
            retval += '\n  task_{}_opt_{}: {!r}'.format(task_letter, idx, k)
        for k in sorted(self.opt):
            retval += '\n  {}: {!r}'.format(k, self.opt[k])
        return retval

class TaskA(Task):
    pass


class TaskB(Task):
    pass


class TaskC(Task):
    pass



def default_constructor(loader, tag_suffix, node):
    assert tag_suffix.startswith('!task_')
    if tag_suffix[6] == 'a':
        task = TaskA
    elif tag_suffix[6] == 'b':
        task = TaskB
    elif tag_suffix[6] == 'c':
        task = TaskC
    else:
        raise NotImplementedError('Unknown task type' + tag_suffix)
    if isinstance(node, ruamel.yaml.ScalarNode):
        assert node.value == ''
        return task()
    elif isinstance(node, ruamel.yaml.MappingNode):
        val = loader.construct_mapping(node)
        return task(**val)
    elif isinstance(node, ruamel.yaml.SequenceNode):
        val = loader.construct_sequence(node)
        return task(*val)
    else:
        raise NotImplementedError('Node: ' + str(type(node)))

ruamel.yaml.add_multi_constructor('', default_constructor,
                                  constructor=ruamel.yaml.SafeConstructor)


with open('config.yaml') as fp:
    tasks = ruamel.yaml.safe_load(fp)
    for task in tasks:
        print(task)

resulting in the same output:

TaskA
TaskB
  task_b_opt_1: ' '
TaskA
  task_a_opt_1: 42
TaskC
  task_c_opt_1: 'foo'
  task_c_opt_2: 'bar'
TaskB

If for some reason you need to use the older PyYAML you can import that and add the constructor using:

ruamel.yaml.add_multi_constructor('', default_constructor,
                                  Loader=yaml.SafeLoader)

You'll have to take care that PyYAML only supports YAML 1.1 and not YAML 1.2

Upvotes: 1

sphere
sphere

Reputation: 1350

First a quick comment: You are using a list, not an array. Also, you are using tuples inside this array.

Anyway, you could use the yaml module for this, I also changes the tuples to lists as there are no tuples in yaml.

from yaml import dump

config = [
    'task_a',
    ['task_b', {'task_b_opt_1': ' '}],
    ['task_a', {'task_a_opt_1': 42}],
    ['task_c', {'task_c_opt_1': 'foo', 'task_c_opt_2': 'bar'}],
    'task_b'
]
print dump(config)

This prints:

- task_a
- - task_b
  - {task_b_opt_1: ' '}
- - task_a
  - {task_a_opt_1: 42}
- - task_c
  - {task_c_opt_1: foo, task_c_opt_2: bar}
- task_b

Upvotes: 1

Related Questions