Reputation: 3573
I have a configuration file in YAML which contains strings, floats, integers and a list. I would like, when the YAML is loaded to return the list a numpy array. So, for example, if the YAML is as follows:
name: 'John Doe'
age: 20
score:
-- 19
- 45
-- 21
- 12
-- 32
- 13
and I read this by
import yaml
def read(CONFIG_FILE):
with open(CONFIG_FILE) as c:
return yaml.load(c)
config = read('path\to\yml')
then I would like config['score']
instead of list to be typed as a numpy.array
. Of course, this could easily be done outside YAML with something like numpy.array(config['score'])
but I want to avoid that.
I have tried setting the tag as described in the documentation (https://pyyaml.org/wiki/PyYAMLDocumentation) but I can not make it work. So for example, the following fails:
score:!!python/object:numpy.array
-- 19
- 45
-- 21
- 12
-- 32
- 13
Changing the tag to !!python/module:numpy.array
or !!python/name:numpy.array
doesn't work either.
How can I make this work? I am using Python v.3
Upvotes: 2
Views: 6263
Reputation: 76742
Dumping a numpy array with the data that you get, will get you a
vastly more complex YAML file than what you can get by just adding a
tag. I therefore recommend that you just define a tag of your own that
causess the data as you have it to load, and then convert to numpy on
the fly. That way you don't have to walk over the resulting loaded structure to find score
or its value.
config.yaml
:
name: 'John Doe'
age: 20
score: !2darray
-- 19
- 45
-- 21
- 12
-- 32
- 13
You also have to realize that the value for score
in that file is a plain multi-line
scalar, that will get loaded as the string '-- 19 - 45 -- 21 - 12 -- 32 - 13'
import sys
import ruamel.yaml
from pathlib import Path
import numpy
config_file = Path('config.yaml')
yaml = ruamel.yaml.YAML(typ='safe')
@yaml.register_class
class Array:
yaml_tag = '!2darray'
@classmethod
def from_yaml(cls, constructor, node):
array = []
for x in node.value.split():
if x == '--':
sub_array = []
array.append(sub_array)
continue
if x == '-':
continue
sub_array.append(int(x))
return numpy.array(array)
data = yaml.load(config_file)
print(type(data['score']))
print(data)
which gives:
<class 'numpy.ndarray'>
{'name': 'John Doe', 'age': 20, 'score': array([[19, 45],
[21, 12],
[32, 13]])}
If in your input the value for score
would be sequence of sequences,
which requires a space after the -
, that only then gets interpreted as
a sequence entry indicator:
name: 'John Doe'
age: 20
score: !2darray
- - 19
- 45
- - 21
- 12
- - 32
- 13
If that would be the input, then you need to adapt the from_yaml
method:
@yaml.register_class
class Array:
yaml_tag = '!2darray'
@classmethod
def from_yaml(cls, constructor, node):
array = constructor.construct_sequence(node, deep=True)
return numpy.array(array)
Which gives exactly the same output as before.
Upvotes: 1