Reputation: 734
I want to read in a YAML file and convert it into a python dataclass. The goal in this example is to be able to produce the same dataclass.
Without reading a YAML file:
from dataclasses import dataclass, field
@dataclass
class OptionsSource:
a: str
b: str = None
kwargs: dict = field(default_factory=dict)
def __post_init__(self):
for k, v in self.kwargs.items():
setattr(self, k, v)
@dataclass
class OptionsInput:
file: str
source: list[OptionsSource] = field(default_factory=list[OptionsSource], kw_only=True)
@dataclass
class Options:
inputs: OptionsInput = field(default_factory=OptionsInput, kw_only=True)
options = Options(
inputs=OptionsInput(
file='file1',
source=[
OptionsSource(a=1, b=2, kwargs={'c': 3}),
OptionsSource(a=10, b=20)
]
))
>>>print(options)
Options(inputs=OptionsInput(file='file1', source=[OptionsSource(a=1, b=2, kwargs={'c': 3}), OptionsSource(a=10, b=20, kwargs={})]))
>>>print(options.inputs.source[0].c)
3
Now, when I read this YAML, my output is different (i.e., OptionsSource
dataclass isn't used).
yaml_input = yaml.load("""
inputs:
file: file1
source:
- a: 1
b: 2
c: 3
- a: 10
b: 20
""", Loader=yaml.FullLoader)
options_from_yaml = Options(inputs=OptionsInput(**yaml_input['inputs']))
>>>print(options_from_yaml)
Options(inputs=OptionsInput(file='file1', source=[{'a': 1, 'b': 2, 'c': 3}, {'a': 10, 'b': 20}]))
My desired output is for options_from_yaml
to match options
.
My two problems:
source
isn't a list of OptionsSource
kwargs
piece of OptionsSource
to let me provide any keyword arguments and have them stored so it can be accessed with options.inputs.source[0].c
.Upvotes: 3
Views: 2362
Reputation: 15967
What I ended up using is Fatal1ty/mashumaro: Fast and well tested serialization library and not yaml
/PyYaml at all.
Here is an example from my blog post Python New Module Guide where I define a @dataclass
, called SerializeMe
and then encode it as JSON and decode it as YAML:
from dataclasses import dataclass
# Basically, JSON, YAML (and TOML) all work the same way.
import mashumaro.codecs.json as json_codec
import mashumaro.codecs.yaml as yaml_codec
@dataclass
class NestedField:
bool_field: bool
@dataclass
class SerializeMe:
string_field: str
int_field: int
nested: NestedField
obj = SerializeMe(
string_field="hello world",
int_field=42,
nested=NestedField(bool_field=True)
)
json = json_codec.encode(obj, SerializeMe)
print(json)
# Prints
# {"string_field": "hello world", "int_field": 42, "nested": {"bool_field": true}}
# Recall that JSON is a subset of YAML, so you can load JSON as YAML.
obj = yaml_codec.decode(json, SerializeMe)
print(obj)
# Prints
# SerializeMe(string_field='hello world', int_field=42, nested=NestedField(bool_field=True))
Complete with errors if the YAML doesn't match the SerializeMe
class etc. Exactly what I was looking for. Also does JSON and TOML.
I don't know about the kwargs
though. I didn't need that.
Upvotes: 2
Reputation: 40833
cattrs
does exactly what you want, with a minimum of fuss. You are looking for the structure()
function.
import cattrs
import yaml
from your_code import Options
yaml_string: str = ...
new_options = cattrs.structure(yaml.safe_load(yaml_string), Options)
cattrs can also handle serialisation for you. See below for a full round trip.
import cattrs
import yaml
options = Options(
inputs=OptionsInput(
file='file1',
source=[
# NB: types of a and c switched to str, to match your type annotations
OptionsSource(a='1', b='2', kwargs={'c': 3}),
OptionsSource(a='10', b='20')
]
)
)
# serialize
options_dict = cattrs.unstructure(options)
assert options_dict == {
'inputs': {
'file': 'file1',
'source': [
{'a': '1', 'b': '2', 'kwargs': {'c': 3}},
{'a': '10', 'b': '20', 'kwargs': {}}
]
}
}
yaml_string = yaml.dump(options_dict)
assert yaml_string == '''\
inputs:
file: file1
source:
- a: '1'
b: '2'
kwargs:
c: 3
- a: '10'
b: '20'
kwargs: {}
'''
# deserialize
new_options = cattrs.structure(yaml.safe_load(yaml_string), Options)
assert new_options == options
Upvotes: 1
Reputation: 12950
Another answer without adding multiple tags in the YAML file, consist in customizing the from_yaml
of each class: in the from_yaml
you can add the tag dynamically using Python.
This results in a more verbose Python (that could probably be optimized by using a common class that performs automatically the changes based on the annotations) but a cleaner YAML.
For example, to add a tag !options_source
in each element of source
, you can have the following OptionsInput.from_yaml
:
@classmethod
def from_yaml(cls, loader, node):
if isinstance(node, yaml.MappingNode):
mapping = {k.value: v for k, v in node.value}
for subnode in mapping["source"].value:
subnode.tag = "!options_source"
fields = loader.construct_mapping(node)
return cls(**fields)
return super().from_yaml(loader, node)
This result in the following code, where you can also add the initial tag to convert the YAML into a given class:
from dataclasses import dataclass, field
from typing import Any, Optional, ClassVar
import yaml
@dataclass
class OptionsSource(yaml.YAMLObject):
"""Options for a given source"""
yaml_tag: ClassVar[str] = "!options_source" # type: ignore
a: str
b: Optional[str] = None
kwargs: Optional[dict[str, Any]] = None
def __getattribute__(self, name: str):
try:
return super().__getattribute__(name)
except AttributeError:
if self.kwargs and name in self.kwargs:
return self.kwargs[name]
raise
@dataclass
class OptionsInput(yaml.YAMLObject):
"""Options for input"""
yaml_tag: ClassVar[str] = "!options_input" # type: ignore
file: str
source: list[OptionsSource] = field(default_factory=list, kw_only=True)
@classmethod
def from_yaml(cls, loader, node):
if isinstance(node, yaml.MappingNode):
mapping = {k.value: v for k, v in node.value}
for subnode in mapping["source"].value:
subnode.tag = OptionsSource.yaml_tag
fields = loader.construct_mapping(node)
return cls(**fields)
return super().from_yaml(loader, node)
@dataclass
class Options(yaml.YAMLObject):
"""Options of the program"""
yaml_tag: ClassVar[str] = "!options" # type: ignore
inputs: OptionsInput = field(default_factory=OptionsInput, kw_only=True) # type: ignore
@classmethod
def from_yaml(cls, loader, node):
if isinstance(node, yaml.MappingNode):
mapping = {k.value: v for k, v in node.value}
mapping["inputs"].tag = OptionsInput.yaml_tag
fields = loader.construct_mapping(node)
return cls(**fields)
return super().from_yaml(loader, node)
YAML = """
inputs:
file: file1
source:
- a: 1
b: 2
kwargs:
c: 3
- a: 10
b: 20
"""
options_from_yaml = yaml.load(Options.yaml_tag + YAML, Loader=yaml.FullLoader)
print(options_from_yaml)
And we get the following result:
Options(inputs=OptionsInput(file='file1', source=[OptionsSource(a=1, b=2, kwargs={'c': 3}), OptionsSource(a=10, b=20, kwargs=None)]))
Upvotes: 1
Reputation: 12950
Actually, the answer can be found in the documentation of PyYaml (although it has been written for Python 2.7 and not updated since :)
The idea is the following:
yaml.YAMLObject
and setting a class attribute yaml_tag
(and optionally yaml_loader = yaml.SafeLoader
to make it work for yaml.safe_load
), which would automatically register this tag when parsing YAML!my_tag
at the right location)kwargs
so you need to keep the same mechanism in your YAML file and probably replace the __post_init__
by a custom __getattribute__
to get attributes from the kwargs if needed.This would give something like this for the modification:
from dataclasses import dataclass, field
from typing import Any, Optional
import yaml
@dataclass
class OptionsSource(yaml.YAMLObject):
"""Options for a given source"""
yaml_tag = "!options_source"
a: str
b: Optional[str] = None
kwargs: Optional[dict[str, Any]] = None
def __getattribute__(self, name: str):
try:
return super().__getattribute__(name)
except AttributeError:
if self.kwargs and name in self.kwargs:
return self.kwargs[name]
raise
@dataclass
class OptionsInput(yaml.YAMLObject):
"""Options for input"""
yaml_tag = "!options_input"
file: str
source: list[OptionsSource] = field(default_factory=list, kw_only=True)
...
And the YAML would be updated as follows:
yaml_input = yaml.load(
"""
inputs: !options_input
file: file1
source:
- !options_source
a: 1
b: 2
kwargs:
c: 3
- !options_source
a: 10
b: 20
""",
Loader=yaml.FullLoader,
)
options_from_yaml = Options(inputs=yaml_input["inputs"])
And we would get:
>>> print(options_from_yaml)
Options(inputs=OptionsInput(file='file1', source=[OptionsSource(a=1, b=2, kwargs={'c': 3}), OptionsSource(a=10, b=20, kwargs=None)]))
Upvotes: 0