gt6989b
gt6989b

Reputation: 4203

Python parsing class from YAML

I am trying to output and then to parse back from YAML the following

import numpy as np
class MyClass(object):
    YAMLTag = '!MyClass'

    def __init__(self, name, times, zeros):
        self.name   = name
        self._T     = np.array(times)
        self._zeros = np.array(zeros)

The YAML file looks like

!MyClass:
  name: InstanceId
  times: [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0]
  zeros: [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]

To write, I have added to the class two methods

def toDict(self):
    return {'name'  : self.name,
            'times' : [float(t) for t in self._T],
            'zeros' : [float(t) for t in self._zeros]}
@staticmethod
def ToYAML(dumper, data):
    return dumper.represent_dict({data.YAMLTag : data.toDict()})

and to read, the method

@staticmethod
def FromYAML(loader, node):
    nodeMap = loader.construct_mapping(node)
    return MyClass(name  = nodeMap['name'],
                   times = nodeMap['times'],
                   zeros = nodeMap['zeros'])

and following YAML Documentation, I added the following snippet in the same Python file myClass.py:

import yaml

yaml.add_constructor(MyClass.YAMLTag, MyClass.FromYAML)
yaml.add_representer(MyClass,         MyClass.ToYAML)

Now, the writing seems to work ok, but reading the YAML, the code

loader.construct_mapping(node)

seems to return the dictionary with empty data:

{'zeros': [], 'name': 'InstanceId', 'times': []}

How should I fix the reader to be able to do this properly? Or perhaps I am not writing something out right? I spent a long time looking at PyYAML documentation and debugging through how the package is implemented but cannot figure out a way to parse out a complicated structure, and the only example I seemed to find has a 1-line class which parses out easily.


Related: YAML parsing and Python


UPDATE

Manually parsing the node as follows worked:

name, times, zeros = None, None, None
for key, value in node.value:
    elementName = loader.construct_scalar(key)
    if elementName == 'name':
        name = loader.construct_scalar(value)
    elif elementName == 'times':
        times = loader.construct_sequence(value)
    elif elementName == 'zeros':
        zeros = loader.construct_sequence(value)
    else:
        raise ValueError('Unexpected YAML key %s' % elementName)

But the question still stands, is there a non-manual way to do this?

Upvotes: 6

Views: 18147

Answers (3)

amateurjustin
amateurjustin

Reputation: 146

Considering the above mentioned answers, all of which are good, there is a Python package available to smartly construct objects from YAML/JSON/dicts, and is actively being developed and expanded. (full disclosure, I am a co-author of this package, see here)

Install:

pip install pickle-rick

Use:

Define a YAML or JSON string (or file).

BASIC:
 text: test
 dictionary:
   one: 1
   two: 2
 number: 2
 list:
   - one
   - two
   - four
   - name: John
     age: 20
 USERNAME:
   type: env
   load: USERNAME
 callable_lambda:
   type: lambda
   load: "lambda: print('hell world!')"
 datenow:
   type: lambda
   import:
     - "from datetime import datetime as dd"
   load: "lambda: print(dd.utcnow().strftime('%Y-%m-%d'))"
 test_function:
   type: function
   name: test_function
   args:
     x: 7
     y: null
     s: hello world
     any:
       - 1
       - hello
   import:
     - "math"
   load: >
     def test(x, y, s, any):
       print(math.e)
       iii = 111
       print(iii)
       print(x,s)
       if y:
         print(type(y))
       else:
         print(y)
       for i in any:
         print(i)

Then use it as an object.

>> from pickle_rick import PickleRick

>> config = PickleRick('./config.yaml', deep=True, load_lambda=True)

>> config.BASIC.dictionary
{'one' : 1, 'two' : 2}

>> config.BASIC.callable_lambda()
hell world!

You can define Python functions, load additional data from other files or REST APIs, environmental variables, and then write everything out to YAML or JSON again.

This works especially well when building systems that require structured configuration files, or in notebooks as interactive structures.

There is a security note to using this. Only load files that are trusted, as any code can be executed, thus stay clear of just loading anything without knowing what the complete contents are.

The package is called PickleRick and is available here:

Upvotes: 1

Anthon
Anthon

Reputation: 76568

There are multiple problems with your approach, even not taking into account that you should read PEP 8, the style guide for Python code, in particular the part on Method Names and Instance Variables

  1. As you indicate you have looked long at the Python documentation, you cannot have failed to notice that yaml.load() is unsafe. It is also is almost never necessary to use it, certainly not if you write your own representers and constructors.

  2. You use dumper.represent_dict({data.YAMLTag : data.toDict()}) which dumps an object as a key-value pair. What you want to do, at least if you want to have a tag in your output YAML is: dumper.represent_mapping(data.YAMLTag, data.toDict()). This will get you output of the form:

    !MyClass
    name: InstanceId
    times: [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0]
    zeros: [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]
    

    i.e. a tagged mapping instead of your key-value pair, where the value is a mapping. (And I would have expected the first line to be '!MyClass': to make sure the scalar that starts with an exclamation mark is not interpreted as a tag).

  3. Constructing a complex object, that are potentially self-referential (directly or indirectly) has to be done in two steps using a generator (the PyYAML code calls this in the correct way for you). In your code you assume that you have all the parameters to create an instance of MyClass. But if there is self-reference, these parameters have to include that instance itself and it is not created yet. The proper example code in the YAML code base for this is construct_yaml_object() in constructor.py:

    def construct_yaml_object(self, node, cls):
        data = cls.__new__(cls)
        yield data
        if hasattr(data, '__setstate__'):
            state = self.construct_mapping(node, deep=True)
            data.__setstate__(state)
        else:
            state = self.construct_mapping(node)
            data.__dict__.update(state)
    

    You don't have to use .__new__(), but you should take deep=True into account as explained here

In general it also is useful to have a __repr__() that allows you to check the object that you load, with something more expressive than <__main__.MyClass object at 0x12345>

The imports:

from __future__ import print_function

import sys
import yaml
from cStringIO import StringIO
import numpy as np

To check the correct workings of self-referential versions I added the self._ref attribute to the class:

class MyClass(object):
    YAMLTag = u'!MyClass'

    def __init__(self, name=None, times=[], zeros=[], ref=None):
        self.update(name, times, zeros, ref)

    def update(self, name, times, zeros, ref):
        self.name = name
        self._T = np.array(times)
        self._zeros = np.array(zeros)
        self._ref = ref

    def toDict(self):
        return dict(name=self.name,
                    times=self._T.tolist(),
                    zeros=self._zeros.tolist(),
                    ref=self._ref,
        )

    def __repr__(self):
        return "{}(name={}, times={}, zeros={})".format(
            self.__class__.__name__,
            self.name,
            self._T.tolist(),
            self._zeros.tolist(),
        )

    def update_self_ref(self, ref):
        self._ref = ref

The representer and constructor "methods":

    @staticmethod
    def to_yaml(dumper, data):
        return dumper.represent_mapping(data.YAMLTag, data.toDict())

    @staticmethod
    def from_yaml(loader, node):
        value = MyClass()
        yield value
        node_map = loader.construct_mapping(node, deep=True)
        value.update(**node_map)


yaml.add_representer(MyClass, MyClass.to_yaml, Dumper=yaml.SafeDumper)
yaml.add_constructor(MyClass.YAMLTag, MyClass.from_yaml, Loader=yaml.SafeLoader)

And how to use it:

instance = MyClass('InstanceId',
                   [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0],
                   [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03])
instance.update_self_ref(instance)

buf = StringIO()
yaml.safe_dump(instance, buf)

yaml_str = buf.getvalue()
print(yaml_str)


data = yaml.safe_load(yaml_str)
print(data)
print(id(data), id(data._ref))

the above combined gives:

&id001 !MyClass
name: InstanceId
ref: *id001
times: [0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0]
zeros: [0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]

MyClass(name=InstanceId, times=[0.0, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0], zeros=[0.03, 0.03, 0.04, 0.03, 0.03, 0.02, 0.03]) 
139737236881744 139737236881744

As you can see the ids of data and data._ref are the same after loading.

The above throws an error if you use the simplistic approach in your constructor, by just using loader.construct_mapping(node, deep=True)

Upvotes: 4

tinita
tinita

Reputation: 4336

Instead of

nodeMap = loader.construct_mapping(node)

try this:

nodeMap = loader.construct_mapping(node, deep=True)

Also, you have a little mistake in your YAML file:

!MyClass:

The colon at the end does not belong there.

Upvotes: 1

Related Questions