2e0byo
2e0byo

Reputation: 5954

Declare JSON encoder on the class itself for Pydantic

I have the following class:

class Thing:
    def __init__(self, x: str):
        self.x = x

    def __str__(self):
        return self.x

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v: str) -> "Thing":
        return cls(v)

Due to the validator method I can use this class as custom field type in a Pydantic model:

from pydantic import BaseModel
from thing import Thing

class Model(BaseModel):
    thing: Thing

But if I want to serialize to JSON I need to set the json_encoders option on the Pydantic model:

class Model(BaseModel):
    class Config:
        json_encoders = {
             Thing: str
        }
    thing: Thing

Now Pydantic can serialize Things to JSON and back. But the config is in two places: Partly on the Model and partly on the class Thing. I'd like to set it all on Thing.

Is there any way to set the json_encoders option on Thing so Pydantic knows how to handle it transparently?

Note that Thing is minimized here: It has a lot of logic and I'm not just trying to declare a custom str type.

Upvotes: 1

Views: 1444

Answers (1)

Daniil Fajnberg
Daniil Fajnberg

Reputation: 18663

This is actually an issue that goes much deeper than Pydantic models in my opinion. I found this ongoing discussion about whether a standard protocol with a method like __json__ or __serialize__ should be introduced in Python.

The problem is that Pydantic is confined by those same limitations of the standard library's json module, in that encoding/serialization logic for custom types is separated from the class itself.

Whether or not the broader idea of introducing such a protocol makes sense, we can piggy-back off of it a little to define a customized version of json.dumps that checks for the presence of e.g. a __serialize__ method and uses that as the default function to serialize the object. (See the json.dump documentation for an explanation of the default parameter.)

Then we can set up a custom base model with the Config.json_dumps option set to that function. That way all child models would automatically fall back to that for serialization (unless overridden by the encoder argument to the BaseModel.json method for example).

Here is an example:

base.py

from collections.abc import Callable
from json import dumps as json_dumps
from typing import Any

from pydantic import BaseModel as PydanticBaseModel


def json_dumps_extended(obj: object, **kwargs: Any) -> str:
    default: Callable[[object], object] = kwargs.pop("default", lambda x: x)

    def custom_default(to_encode: object) -> object:
        serialize_method = getattr(to_encode, "__serialize__", None)
        if serialize_method is None:
            return default(to_encode)
        return serialize_method()  # <-- already bound to `to_encode`

    return json_dumps(obj, default=custom_default, **kwargs)


class BaseModel(PydanticBaseModel):
    class Config:
        json_dumps = json_dumps_extended

application.py

from __future__ import annotations
from collections.abc import Callable, Iterator

from .base import BaseModel


class Thing:
    def __init__(self, x: str) -> None:
        self.x = x

    def __str__(self) -> str:
        return self.x

    def __serialize__(self) -> str:  # <-- this is the magic method
        return self.x

    @classmethod
    def __get_validators__(cls) -> Iterator[Callable[..., Thing]]:
        yield cls.validate

    @classmethod
    def validate(cls, v: str) -> Thing:
        return cls(v)


class Model(BaseModel):
    thing: Thing
    num: float = 3.14


instance = Model(thing=Thing("foo"))
print(instance.json(indent=4))

Output:

{
    "thing": "foo",
    "num": 3.14
}

Note for Python <3.9 users: Import the Callable and Iterator types from typing instead of collections.abc.


PS

If you want to be able to re-use this approach to serialization in more places than just the base model, it may be a good idea to put a bit more effort into the types. A runtime_checkable custom protocol for our __serialize__ method may be useful.

Also we can make the json_dumps_extended method a bit less clunky by using functools.partial.

Here is a slightly more sophisticated version of the suggested base.py:

from collections.abc import Callable
from functools import partial
from json import dumps as json_dumps
from typing import Any, Optional, Protocol, TypeVar, overload, runtime_checkable

from pydantic import BaseModel as PydanticBaseModel

T = TypeVar("T")
T_co = TypeVar("T_co", covariant=True)
Func1Arg = Callable[[object], T]


@runtime_checkable
class Serializable(Protocol[T_co]):
    def __serialize__(self) -> T_co: ...


@overload
def serialize(obj: Serializable[T_co]) -> T_co: ...


@overload
def serialize(obj: Any, fallback: Func1Arg[T]) -> T: ...


def serialize(obj: Any, fallback: Optional[Func1Arg[Any]] = None) -> Any:
    if isinstance(obj, Serializable):
        return obj.__serialize__()
    if fallback is None:
        raise TypeError(f"Object not serializable: {obj}")
    return fallback(obj)


def _id(x: T) -> T: return x


def json_dumps_extended(obj: object, **kwargs: Any) -> str:
    custom_default = partial(serialize, fallback=kwargs.pop("default", _id))
    return json_dumps(obj, default=custom_default, **kwargs)


class BaseModel(PydanticBaseModel):
    class Config:
        json_dumps = json_dumps_extended

Another alternative might have been to just monkey-patch JSONEncoder.default directly. But without further configurations Pydantic seems to still perform the type checks itself and prevent serialization before that method is even called.

I don't think we have a better option, until some standard serialization protocol (at least for JSON) is introduced.

Upvotes: 4

Related Questions