Reputation: 4203
I am trying to output and then to parse back from YAML the following
import numpy as np
class MyClass(object):
YAMLTag = '!MyClass'
def __init__(self, name, times, zeros):
self.name = name
self._T = np.array(times)
self._zeros = np.array(zeros)
The YAML file looks like
!MyClass:
name: InstanceId
times: [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0]
zeros: [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]
To write, I have added to the class two methods
def toDict(self):
return {'name' : self.name,
'times' : [float(t) for t in self._T],
'zeros' : [float(t) for t in self._zeros]}
@staticmethod
def ToYAML(dumper, data):
return dumper.represent_dict({data.YAMLTag : data.toDict()})
and to read, the method
@staticmethod
def FromYAML(loader, node):
nodeMap = loader.construct_mapping(node)
return MyClass(name = nodeMap['name'],
times = nodeMap['times'],
zeros = nodeMap['zeros'])
and following YAML Documentation, I added the following snippet in the same Python file myClass.py
:
import yaml
yaml.add_constructor(MyClass.YAMLTag, MyClass.FromYAML)
yaml.add_representer(MyClass, MyClass.ToYAML)
Now, the writing seems to work ok, but reading the YAML, the code
loader.construct_mapping(node)
seems to return the dictionary with empty data:
{'zeros': [], 'name': 'InstanceId', 'times': []}
How should I fix the reader to be able to do this properly? Or perhaps I am not writing something out right? I spent a long time looking at PyYAML documentation and debugging through how the package is implemented but cannot figure out a way to parse out a complicated structure, and the only example I seemed to find has a 1-line class which parses out easily.
Related: YAML parsing and Python
UPDATE
Manually parsing the node as follows worked:
name, times, zeros = None, None, None
for key, value in node.value:
elementName = loader.construct_scalar(key)
if elementName == 'name':
name = loader.construct_scalar(value)
elif elementName == 'times':
times = loader.construct_sequence(value)
elif elementName == 'zeros':
zeros = loader.construct_sequence(value)
else:
raise ValueError('Unexpected YAML key %s' % elementName)
But the question still stands, is there a non-manual way to do this?
Upvotes: 6
Views: 18147
Reputation: 146
Considering the above mentioned answers, all of which are good, there is a Python package available to smartly construct objects from YAML/JSON/dicts, and is actively being developed and expanded. (full disclosure, I am a co-author of this package, see here)
Install:
pip install pickle-rick
Use:
Define a YAML or JSON string (or file).
BASIC:
text: test
dictionary:
one: 1
two: 2
number: 2
list:
- one
- two
- four
- name: John
age: 20
USERNAME:
type: env
load: USERNAME
callable_lambda:
type: lambda
load: "lambda: print('hell world!')"
datenow:
type: lambda
import:
- "from datetime import datetime as dd"
load: "lambda: print(dd.utcnow().strftime('%Y-%m-%d'))"
test_function:
type: function
name: test_function
args:
x: 7
y: null
s: hello world
any:
- 1
- hello
import:
- "math"
load: >
def test(x, y, s, any):
print(math.e)
iii = 111
print(iii)
print(x,s)
if y:
print(type(y))
else:
print(y)
for i in any:
print(i)
Then use it as an object.
>> from pickle_rick import PickleRick
>> config = PickleRick('./config.yaml', deep=True, load_lambda=True)
>> config.BASIC.dictionary
{'one' : 1, 'two' : 2}
>> config.BASIC.callable_lambda()
hell world!
You can define Python functions, load additional data from other files or REST APIs, environmental variables, and then write everything out to YAML or JSON again.
This works especially well when building systems that require structured configuration files, or in notebooks as interactive structures.
There is a security note to using this. Only load files that are trusted, as any code can be executed, thus stay clear of just loading anything without knowing what the complete contents are.
The package is called PickleRick and is available here:
Upvotes: 1
Reputation: 76568
There are multiple problems with your approach, even not taking into account that you should read PEP 8, the style guide for Python code, in particular the part on Method Names and Instance Variables
As you indicate you have looked long at the Python documentation, you cannot have failed to notice that yaml.load()
is unsafe. It is also is almost never necessary to use it, certainly not if you write your own representers and constructors.
You use dumper.represent_dict({data.YAMLTag : data.toDict()})
which dumps an object as a key-value pair. What you want to do, at least if you want to have a tag in your output YAML is: dumper.represent_mapping(data.YAMLTag, data.toDict())
. This will get you output of the form:
!MyClass
name: InstanceId
times: [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0]
zeros: [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]
i.e. a tagged mapping instead of your key-value pair, where the value is a mapping. (And I would have expected the first line to be '!MyClass':
to make sure the scalar that starts with an exclamation mark is not interpreted as a tag).
Constructing a complex object, that are potentially self-referential (directly or indirectly) has to be done in two steps using a generator (the PyYAML code calls this in the correct way for you). In your code you assume that you have all the parameters to create an instance of MyClass
. But if there is self-reference, these parameters have to include that instance itself and it is not created yet. The proper example code in the YAML code base for this is construct_yaml_object()
in constructor.py
:
def construct_yaml_object(self, node, cls):
data = cls.__new__(cls)
yield data
if hasattr(data, '__setstate__'):
state = self.construct_mapping(node, deep=True)
data.__setstate__(state)
else:
state = self.construct_mapping(node)
data.__dict__.update(state)
You don't have to use .__new__()
, but you should take deep=True
into account as explained here
In general it also is useful to have a __repr__()
that allows you to check the object that you load, with something more expressive than <__main__.MyClass object at 0x12345>
The imports:
from __future__ import print_function
import sys
import yaml
from cStringIO import StringIO
import numpy as np
To check the correct workings of self-referential versions I added the self._ref
attribute to the class:
class MyClass(object):
YAMLTag = u'!MyClass'
def __init__(self, name=None, times=[], zeros=[], ref=None):
self.update(name, times, zeros, ref)
def update(self, name, times, zeros, ref):
self.name = name
self._T = np.array(times)
self._zeros = np.array(zeros)
self._ref = ref
def toDict(self):
return dict(name=self.name,
times=self._T.tolist(),
zeros=self._zeros.tolist(),
ref=self._ref,
)
def __repr__(self):
return "{}(name={}, times={}, zeros={})".format(
self.__class__.__name__,
self.name,
self._T.tolist(),
self._zeros.tolist(),
)
def update_self_ref(self, ref):
self._ref = ref
The representer and constructor "methods":
@staticmethod
def to_yaml(dumper, data):
return dumper.represent_mapping(data.YAMLTag, data.toDict())
@staticmethod
def from_yaml(loader, node):
value = MyClass()
yield value
node_map = loader.construct_mapping(node, deep=True)
value.update(**node_map)
yaml.add_representer(MyClass, MyClass.to_yaml, Dumper=yaml.SafeDumper)
yaml.add_constructor(MyClass.YAMLTag, MyClass.from_yaml, Loader=yaml.SafeLoader)
And how to use it:
instance = MyClass('InstanceId',
[0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0],
[0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03])
instance.update_self_ref(instance)
buf = StringIO()
yaml.safe_dump(instance, buf)
yaml_str = buf.getvalue()
print(yaml_str)
data = yaml.safe_load(yaml_str)
print(data)
print(id(data), id(data._ref))
the above combined gives:
&id001 !MyClass
name: InstanceId
ref: *id001
times: [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0]
zeros: [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]
MyClass(name=InstanceId, times=[0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0], zeros=[0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03])
139737236881744 139737236881744
As you can see the id
s of data
and data._ref
are the same after loading.
The above throws an error if you use the simplistic approach in your constructor, by just using loader.construct_mapping(node, deep=True)
Upvotes: 4
Reputation: 4336
Instead of
nodeMap = loader.construct_mapping(node)
try this:
nodeMap = loader.construct_mapping(node, deep=True)
Also, you have a little mistake in your YAML file:
!MyClass:
The colon at the end does not belong there.
Upvotes: 1