Aenaon
Aenaon

Reputation: 3573

Numpy array from YAML

I have a configuration file in YAML which contains strings, floats, integers and a list. I would like, when the YAML is loaded to return the list a numpy array. So, for example, if the YAML is as follows:

name: 'John Doe'
age: 20
score:
  -- 19
   - 45
  -- 21
   - 12
  -- 32
   - 13

and I read this by

import yaml

def read(CONFIG_FILE):
    with open(CONFIG_FILE) as c:
        return yaml.load(c)

config = read('path\to\yml')

then I would like config['score'] instead of list to be typed as a numpy.array. Of course, this could easily be done outside YAML with something like numpy.array(config['score']) but I want to avoid that.

I have tried setting the tag as described in the documentation (https://pyyaml.org/wiki/PyYAMLDocumentation) but I can not make it work. So for example, the following fails:

score:!!python/object:numpy.array
  -- 19
   - 45
  -- 21
   - 12
  -- 32
   - 13

Changing the tag to !!python/module:numpy.array or !!python/name:numpy.array doesn't work either.

How can I make this work? I am using Python v.3

Upvotes: 2

Views: 6263

Answers (1)

Anthon
Anthon

Reputation: 76742

Dumping a numpy array with the data that you get, will get you a vastly more complex YAML file than what you can get by just adding a tag. I therefore recommend that you just define a tag of your own that causess the data as you have it to load, and then convert to numpy on the fly. That way you don't have to walk over the resulting loaded structure to find score or its value.

config.yaml:

name: 'John Doe'
age: 20
score: !2darray
  -- 19
   - 45
  -- 21
   - 12
  -- 32
   - 13

You also have to realize that the value for score in that file is a plain multi-line scalar, that will get loaded as the string '-- 19 - 45 -- 21 - 12 -- 32 - 13'

import sys
import ruamel.yaml
from pathlib import Path
import numpy

config_file = Path('config.yaml')

yaml = ruamel.yaml.YAML(typ='safe')

@yaml.register_class
class Array:
    yaml_tag = '!2darray'

    @classmethod
    def from_yaml(cls, constructor, node):
        array = []
        for x in node.value.split():
            if x == '--':
                sub_array = []
                array.append(sub_array)
                continue
            if x == '-':
                continue
            sub_array.append(int(x))
        return numpy.array(array)

data = yaml.load(config_file)
print(type(data['score']))
print(data)

which gives:

<class 'numpy.ndarray'>
{'name': 'John Doe', 'age': 20, 'score': array([[19, 45],
       [21, 12],
       [32, 13]])}

If in your input the value for score would be sequence of sequences, which requires a space after the -, that only then gets interpreted as a sequence entry indicator:

name: 'John Doe'
age: 20
score: !2darray
  - - 19
    - 45
  - - 21
    - 12
  - - 32
    - 13

If that would be the input, then you need to adapt the from_yaml method:

@yaml.register_class
class Array:
    yaml_tag = '!2darray'

    @classmethod
    def from_yaml(cls, constructor, node):
        array = constructor.construct_sequence(node, deep=True)
        return numpy.array(array)

Which gives exactly the same output as before.

Upvotes: 1

Related Questions