Simon1
Simon1

Reputation: 734

Converting YAML file to dataclass with nested dataclasses and optional keyword arguments

I want to read in a YAML file and convert it into a python dataclass. The goal in this example is to be able to produce the same dataclass.

Without reading a YAML file:

from dataclasses import dataclass, field


@dataclass
class OptionsSource:
    a: str
    b: str = None
    kwargs: dict = field(default_factory=dict)

    def __post_init__(self):
        for k, v in self.kwargs.items():
            setattr(self, k, v)


@dataclass
class OptionsInput:
    file: str
    source: list[OptionsSource] = field(default_factory=list[OptionsSource], kw_only=True)


@dataclass
class Options:
    inputs: OptionsInput = field(default_factory=OptionsInput, kw_only=True)


options = Options(
    inputs=OptionsInput(
        file='file1',
        source=[
            OptionsSource(a=1, b=2, kwargs={'c': 3}),
            OptionsSource(a=10, b=20)
        ]
    ))
>>>print(options)
Options(inputs=OptionsInput(file='file1', source=[OptionsSource(a=1, b=2, kwargs={'c': 3}), OptionsSource(a=10, b=20, kwargs={})]))

>>>print(options.inputs.source[0].c)
3

Now, when I read this YAML, my output is different (i.e., OptionsSource dataclass isn't used).

yaml_input = yaml.load("""
inputs:
    file: file1
    source:
        - a: 1
          b: 2
          c: 3
        - a: 10
          b: 20
""", Loader=yaml.FullLoader)

options_from_yaml = Options(inputs=OptionsInput(**yaml_input['inputs']))
>>>print(options_from_yaml)
Options(inputs=OptionsInput(file='file1', source=[{'a': 1, 'b': 2, 'c': 3}, {'a': 10, 'b': 20}]))

My desired output is for options_from_yaml to match options.

My two problems:

  1. source isn't a list of OptionsSource
  2. I can't figure out how the kwargs piece of OptionsSource to let me provide any keyword arguments and have them stored so it can be accessed with options.inputs.source[0].c.

Upvotes: 3

Views: 2362

Answers (4)

Peter V. Mørch
Peter V. Mørch

Reputation: 15967

What I ended up using is Fatal1ty/mashumaro: Fast and well tested serialization library and not yaml/PyYaml at all.

Here is an example from my blog post Python New Module Guide where I define a @dataclass, called SerializeMe and then encode it as JSON and decode it as YAML:

from dataclasses import dataclass
# Basically, JSON, YAML (and TOML) all work the same way.
import mashumaro.codecs.json as json_codec
import mashumaro.codecs.yaml as yaml_codec

@dataclass
class NestedField:
    bool_field: bool

@dataclass
class SerializeMe:
    string_field: str
    int_field: int
    nested: NestedField
    
obj = SerializeMe(
    string_field="hello world",
    int_field=42,
    nested=NestedField(bool_field=True)
)

json = json_codec.encode(obj, SerializeMe)

print(json)
# Prints
# {"string_field": "hello world", "int_field": 42, "nested": {"bool_field": true}}

# Recall that JSON is a subset of YAML, so you can load JSON as YAML.
obj = yaml_codec.decode(json, SerializeMe)

print(obj)
# Prints
# SerializeMe(string_field='hello world', int_field=42, nested=NestedField(bool_field=True))

Complete with errors if the YAML doesn't match the SerializeMe class etc. Exactly what I was looking for. Also does JSON and TOML.

I don't know about the kwargs though. I didn't need that.

Upvotes: 2

Dunes
Dunes

Reputation: 40833

cattrs does exactly what you want, with a minimum of fuss. You are looking for the structure() function.

Deserialization

import cattrs
import yaml
from your_code import Options

yaml_string: str = ...
new_options = cattrs.structure(yaml.safe_load(yaml_string), Options)

Serialization

cattrs can also handle serialisation for you. See below for a full round trip.

import cattrs
import yaml

options = Options(
    inputs=OptionsInput(
        file='file1',
        source=[
            # NB: types of a and c switched to str, to match your type annotations
            OptionsSource(a='1', b='2', kwargs={'c': 3}),
            OptionsSource(a='10', b='20')
        ]
    )
)

# serialize
options_dict = cattrs.unstructure(options)
assert options_dict == {
    'inputs': {
        'file': 'file1',
        'source': [
            {'a': '1', 'b': '2', 'kwargs': {'c': 3}},
            {'a': '10', 'b': '20', 'kwargs': {}}
        ]
    }
}
yaml_string = yaml.dump(options_dict)
assert yaml_string == '''\
inputs:
  file: file1
  source:
  - a: '1'
    b: '2'
    kwargs:
      c: 3
  - a: '10'
    b: '20'
    kwargs: {}
'''

# deserialize
new_options = cattrs.structure(yaml.safe_load(yaml_string), Options)
assert new_options == options

Upvotes: 1

Jean-Francois T.
Jean-Francois T.

Reputation: 12950

Another answer without adding multiple tags in the YAML file, consist in customizing the from_yaml of each class: in the from_yaml you can add the tag dynamically using Python.

This results in a more verbose Python (that could probably be optimized by using a common class that performs automatically the changes based on the annotations) but a cleaner YAML.

For example, to add a tag !options_source in each element of source, you can have the following OptionsInput.from_yaml:

    @classmethod
    def from_yaml(cls, loader, node):
        if isinstance(node, yaml.MappingNode):
            mapping = {k.value: v for k, v in node.value}
            for subnode in mapping["source"].value:
                subnode.tag = "!options_source"

            fields = loader.construct_mapping(node)
            return cls(**fields)

        return super().from_yaml(loader, node)

This result in the following code, where you can also add the initial tag to convert the YAML into a given class:


from dataclasses import dataclass, field
from typing import Any, Optional, ClassVar

import yaml


@dataclass
class OptionsSource(yaml.YAMLObject):
    """Options for a given source"""

    yaml_tag: ClassVar[str] = "!options_source"  # type: ignore
    a: str
    b: Optional[str] = None
    kwargs: Optional[dict[str, Any]] = None

    def __getattribute__(self, name: str):
        try:
            return super().__getattribute__(name)
        except AttributeError:
            if self.kwargs and name in self.kwargs:
                return self.kwargs[name]
            raise


@dataclass
class OptionsInput(yaml.YAMLObject):
    """Options for input"""

    yaml_tag: ClassVar[str] = "!options_input"  # type: ignore
    file: str
    source: list[OptionsSource] = field(default_factory=list, kw_only=True)

    @classmethod
    def from_yaml(cls, loader, node):
        if isinstance(node, yaml.MappingNode):
            mapping = {k.value: v for k, v in node.value}
            for subnode in mapping["source"].value:
                subnode.tag = OptionsSource.yaml_tag

            fields = loader.construct_mapping(node)
            return cls(**fields)

        return super().from_yaml(loader, node)


@dataclass
class Options(yaml.YAMLObject):
    """Options of the program"""

    yaml_tag: ClassVar[str] = "!options"  # type: ignore
    inputs: OptionsInput = field(default_factory=OptionsInput, kw_only=True)  # type: ignore

    @classmethod
    def from_yaml(cls, loader, node):
        if isinstance(node, yaml.MappingNode):
            mapping = {k.value: v for k, v in node.value}
            mapping["inputs"].tag = OptionsInput.yaml_tag

            fields = loader.construct_mapping(node)
            return cls(**fields)

        return super().from_yaml(loader, node)


YAML = """
inputs:
    file: file1
    source:
        - a: 1
          b: 2
          kwargs:
            c: 3
        - a: 10
          b: 20
"""


options_from_yaml = yaml.load(Options.yaml_tag + YAML, Loader=yaml.FullLoader)
print(options_from_yaml)

And we get the following result: Options(inputs=OptionsInput(file='file1', source=[OptionsSource(a=1, b=2, kwargs={'c': 3}), OptionsSource(a=10, b=20, kwargs=None)]))

Upvotes: 1

Jean-Francois T.
Jean-Francois T.

Reputation: 12950

Actually, the answer can be found in the documentation of PyYaml (although it has been written for Python 2.7 and not updated since :)

The idea is the following:

  1. Assign a YAML tag to the classes by deriving from yaml.YAMLObject and setting a class attribute yaml_tag (and optionally yaml_loader = yaml.SafeLoader to make it work for yaml.safe_load), which would automatically register this tag when parsing YAML
  2. Use this tag in the YAML file (with !my_tag at the right location)
  3. ... and you have a weird mechanism of extra parameters using kwargs so you need to keep the same mechanism in your YAML file and probably replace the __post_init__ by a custom __getattribute__ to get attributes from the kwargs if needed.

This would give something like this for the modification:

from dataclasses import dataclass, field
from typing import Any, Optional

import yaml


@dataclass
class OptionsSource(yaml.YAMLObject):
    """Options for a given source"""

    yaml_tag = "!options_source"
    a: str
    b: Optional[str] = None
    kwargs: Optional[dict[str, Any]] = None

    def __getattribute__(self, name: str):
        try:
            return super().__getattribute__(name)
        except AttributeError:
            if self.kwargs and name in self.kwargs:
                return self.kwargs[name]
            raise


@dataclass
class OptionsInput(yaml.YAMLObject):
    """Options for input"""

    yaml_tag = "!options_input"
    file: str
    source: list[OptionsSource] = field(default_factory=list, kw_only=True)

...

And the YAML would be updated as follows:

yaml_input = yaml.load(
    """
inputs: !options_input
    file: file1
    source:
        - !options_source
          a: 1
          b: 2
          kwargs:
            c: 3
        - !options_source
          a: 10
          b: 20
""",
    Loader=yaml.FullLoader,
)

options_from_yaml = Options(inputs=yaml_input["inputs"])

And we would get:

>>> print(options_from_yaml)
Options(inputs=OptionsInput(file='file1', source=[OptionsSource(a=1, b=2, kwargs={'c': 3}), OptionsSource(a=10, b=20, kwargs=None)]))

Upvotes: 0

Related Questions