twhughes
twhughes

Reputation: 534

Exporting and Loading nested Pydantic models

I have a simple pydantic model with nested data structures. I want to be able to simply save and load instances of this model as .json file.

All models inherit from a Base class with simple configuration.

class Base(pydantic.BaseModel):
    class Config:
        extra = 'forbid'   # forbid use of extra kwargs

There are some simple data models with inheritance

class Thing(Base):
    thing_id: int

class SubThing(Thing):
    name: str

And a Container class, which holds a Thing

class Container(Base):
    thing: Thing

I can create a Container instance and save it as .json

# make instance of container
c = Container(
    thing = SubThing(
        thing_id=1,
        name='my_thing')
)

json_string = c.json(indent=2)
print(json_string)

"""
{
  "thing": {
    "thing_id": 1,
    "name": "my_thing"
  }
}
"""

but the json string does not specify that the thing field was constructed using a SubThing. As such, when I try to load this string into a new Container instance, I get an error.

print(c)
"""
Traceback (most recent call last):
  File "...", line 36, in <module>
    c = Container.parse_raw(json_string)
  File "pydantic/main.py", line 601, in pydantic.main.BaseModel.parse_raw
  File "pydantic/main.py", line 578, in pydantic.main.BaseModel.parse_obj
  File "pydantic/main.py", line 406, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Container
thing -> name
  extra fields not permitted (type=value_error.extra)
"""

Is there a simple way to save the Container instance while retaining information about the thing class type such that I can reconstruct the initial Container instance reliably? I would like to avoid pickling the object if possible.

One possible solution is to serialize manually, for example using


def serialize(attr_name, attr_value, dictionary=None):
    if dictionary is None:
        dictionary = {}
    if not isinstance(attr_value, pydantic.BaseModel):
        dictionary[attr_name] = attr_value
    else:
        sub_dictionary = {}
        for (sub_name, sub_value) in attr_value:
            serialize(sub_name, sub_value, dictionary=sub_dictionary)
        dictionary[attr_name] = {type(attr_value).__name__: sub_dictionary}
    return dictionary


c1 = Container(
    container_name='my_container',
    thing=SubThing(
        thing_id=1,
        name='my_thing')
)

from pprint import pprint as print
print(serialize('Container', c1))

{'Container': {'Container': {'container_name': 'my_container',
                             'thing': {'SubThing': {'name': 'my_thing',
                                                    'thing_id': 1}}}}}

but this gets rid of most of the benefits of leveraging the package for serialization.

Upvotes: 1

Views: 12434

Answers (2)

Energeneer
Energeneer

Reputation: 431

Since pydantic 2.0, pydantic no longer digs through all your models by default and only outputs the immediate models to dict, string, json, etc.

They do this to

[...] ensure that you know precisely which fields could be included when serializing, even if subclasses get passed when instantiating the object. In particular, this can help prevent surprises when adding sensitive information like secrets as fields of subclasses.

See the migration warning here.

The suggested solution is to serialize with duck typeing:

from pydantic import BaseModel, SerializeAsAny

class Thing(BaseModel):
    thing_id: int

class SubThing(Thing):
    name: str

class Container(BaseModel):
    thing: SerializeAsAny[Thing]

This seemed to solve the problem for me: .dict() and .model_dump() now work as intended.

Upvotes: 2

Wizard.Ritvik
Wizard.Ritvik

Reputation: 11612

Try this solution, which I was able to get it working with pydantic. It's a bit ugly and somewhat hackish, but at least it works as expected.

import pydantic


class Base(pydantic.BaseModel):
    class Config:
        extra = 'forbid'   # forbid use of extra kwargs


class Thing(Base):
    thing_id: int


class SubThing(Thing):
    name: str


class Container(Base):
    thing: Thing

    def __init__(self, **kwargs):
        # This answer helped steer me towards this solution:
        #   https://stackoverflow.com/a/66582140/10237506
        if not isinstance(kwargs['thing'], SubThing):
            kwargs['thing'] = SubThing(**kwargs['thing'])
        super().__init__(**kwargs)


def main():
    # make instance of container
    c1 = Container(
        thing=SubThing(
            thing_id=1,
            name='my_thing')
    )

    d = c1.dict()
    print(d)
    # {'thing': {'thing_id': 1, 'name': 'my_thing'}}

    # Now it works!
    c2 = Container(**d)

    print(c2)
    # thing=SubThing(thing_id=1, name='my_thing')
    
    # assert that the values for the de-serialized instance is the same
    assert c1 == c2


if __name__ == '__main__':
    main()

If you don't need some of the features that pydantic provides such as data validation, you can just use normal dataclasses easily enough. You can pair this with a (de)serialization library like dataclass-wizard that provides automatic case transforms and type conversion (for ex. string to annotated int) that works much the same as it does with pydantic. Here is a straightforward enough usage of that below:

from dataclasses import dataclass

from dataclass_wizard import asdict, fromdict


@dataclass
class Thing:
    thing_id: int


@dataclass
class SubThing(Thing):
    name: str


@dataclass
class Container:
    # Note: I had to update the annotation to `SubThing`. otherwise
    # when de-serializing, it creates a `Thing` instance which is not
    # what we want.
    thing: SubThing


def main():
    # make instance of container
    c1 = Container(
        thing=SubThing(
            thing_id=1,
            name='my_thing')
    )

    d = asdict(c1)
    print(d)
    # {'thing': {'thingId': 1, 'name': 'my_thing'}}

    # De-serialize a dict object in a new `Container` instance
    c2 = fromdict(Container, d)

    print(c2)
    # Container(thing=SubThing(thing_id=1, name='my_thing'))

    # assert that the values for the de-serialized instance is the same
    assert c1 == c2


if __name__ == '__main__':
    main()

Upvotes: 2

Related Questions