Reputation: 534
I have a simple pydantic
model with nested data structures.
I want to be able to simply save and load instances of this model as .json file.
All models inherit from a Base
class with simple configuration.
class Base(pydantic.BaseModel):
class Config:
extra = 'forbid' # forbid use of extra kwargs
There are some simple data models with inheritance
class Thing(Base):
thing_id: int
class SubThing(Thing):
name: str
And a Container
class, which holds a Thing
class Container(Base):
thing: Thing
I can create a Container
instance and save it as .json
# make instance of container
c = Container(
thing = SubThing(
thing_id=1,
name='my_thing')
)
json_string = c.json(indent=2)
print(json_string)
"""
{
"thing": {
"thing_id": 1,
"name": "my_thing"
}
}
"""
but the json string does not specify that the thing
field was constructed using a SubThing
. As such, when I try to load this string into a new Container
instance, I get an error.
print(c)
"""
Traceback (most recent call last):
File "...", line 36, in <module>
c = Container.parse_raw(json_string)
File "pydantic/main.py", line 601, in pydantic.main.BaseModel.parse_raw
File "pydantic/main.py", line 578, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 406, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Container
thing -> name
extra fields not permitted (type=value_error.extra)
"""
Is there a simple way to save the Container
instance while retaining information about the thing
class type such that I can reconstruct the initial Container
instance reliably? I would like to avoid pickling the object if possible.
One possible solution is to serialize manually, for example using
def serialize(attr_name, attr_value, dictionary=None):
if dictionary is None:
dictionary = {}
if not isinstance(attr_value, pydantic.BaseModel):
dictionary[attr_name] = attr_value
else:
sub_dictionary = {}
for (sub_name, sub_value) in attr_value:
serialize(sub_name, sub_value, dictionary=sub_dictionary)
dictionary[attr_name] = {type(attr_value).__name__: sub_dictionary}
return dictionary
c1 = Container(
container_name='my_container',
thing=SubThing(
thing_id=1,
name='my_thing')
)
from pprint import pprint as print
print(serialize('Container', c1))
{'Container': {'Container': {'container_name': 'my_container',
'thing': {'SubThing': {'name': 'my_thing',
'thing_id': 1}}}}}
but this gets rid of most of the benefits of leveraging the package for serialization.
Upvotes: 1
Views: 12434
Reputation: 431
Since pydantic 2.0, pydantic no longer digs through all your models by default and only outputs the immediate models to dict, string, json, etc.
They do this to
[...] ensure that you know precisely which fields could be included when serializing, even if subclasses get passed when instantiating the object. In particular, this can help prevent surprises when adding sensitive information like secrets as fields of subclasses.
See the migration warning here.
The suggested solution is to serialize with duck typeing:
from pydantic import BaseModel, SerializeAsAny
class Thing(BaseModel):
thing_id: int
class SubThing(Thing):
name: str
class Container(BaseModel):
thing: SerializeAsAny[Thing]
This seemed to solve the problem for me: .dict()
and .model_dump()
now work as intended.
Upvotes: 2
Reputation: 11612
Try this solution, which I was able to get it working with pydantic
. It's a bit ugly and somewhat hackish, but at least it works as expected.
import pydantic
class Base(pydantic.BaseModel):
class Config:
extra = 'forbid' # forbid use of extra kwargs
class Thing(Base):
thing_id: int
class SubThing(Thing):
name: str
class Container(Base):
thing: Thing
def __init__(self, **kwargs):
# This answer helped steer me towards this solution:
# https://stackoverflow.com/a/66582140/10237506
if not isinstance(kwargs['thing'], SubThing):
kwargs['thing'] = SubThing(**kwargs['thing'])
super().__init__(**kwargs)
def main():
# make instance of container
c1 = Container(
thing=SubThing(
thing_id=1,
name='my_thing')
)
d = c1.dict()
print(d)
# {'thing': {'thing_id': 1, 'name': 'my_thing'}}
# Now it works!
c2 = Container(**d)
print(c2)
# thing=SubThing(thing_id=1, name='my_thing')
# assert that the values for the de-serialized instance is the same
assert c1 == c2
if __name__ == '__main__':
main()
If you don't need some of the features that pydantic
provides such as data validation, you can just use normal dataclasses easily enough. You can pair this with a (de)serialization library like dataclass-wizard that provides automatic case transforms and type conversion (for ex. string to annotated int
) that works much the same as it does with pydantic
. Here is a straightforward enough usage of that below:
from dataclasses import dataclass
from dataclass_wizard import asdict, fromdict
@dataclass
class Thing:
thing_id: int
@dataclass
class SubThing(Thing):
name: str
@dataclass
class Container:
# Note: I had to update the annotation to `SubThing`. otherwise
# when de-serializing, it creates a `Thing` instance which is not
# what we want.
thing: SubThing
def main():
# make instance of container
c1 = Container(
thing=SubThing(
thing_id=1,
name='my_thing')
)
d = asdict(c1)
print(d)
# {'thing': {'thingId': 1, 'name': 'my_thing'}}
# De-serialize a dict object in a new `Container` instance
c2 = fromdict(Container, d)
print(c2)
# Container(thing=SubThing(thing_id=1, name='my_thing'))
# assert that the values for the de-serialized instance is the same
assert c1 == c2
if __name__ == '__main__':
main()
Upvotes: 2