Reputation: 3162
I have a pydantic (v2) BaseModel that can take a polars DataFrame as one of its model fields. I wish to be able to serialize the dataframe. Preferably, I would be able to serialize AND de-serialize it, but I would be happy with just being able to serialize it.
The polars dataframe has a df.write_json()
method. My thinking has been that I would take the json output from that method and read it back in via the python json
library, so that it becomes a json-serializeable dict. Then I would somehow attach this "encoder" to the pydantic json method. For the deserialization process, I would use the pl.read_json()
method to produce a dataframe.
Unfortunately, in the pydantic documentation, I can tell how to write a custom serializer for a named field, but not for a given type.
There are some docs on serializing subclasses by introducing a __get_pydantic_core_schema__
class method, but I would prefer to avoid this approach, since I would like to be able to use the polars classes directly.
Here is an example where currently, Foo().model_dump_json()
results in a PydanticSerializationError: Unable to serialize unknown type: <class 'polars.dataframe.frame.DataFrame'>
error.
from typing import Any
from pydantic import BaseModel
import polars as pl
import json
df = pl.DataFrame({"foo":[1,2,3], "bar":[4,5,6]})
df.write_json() # this produces a json representation of my dataframe
# {"columns":[{"name":"foo","datatype":"Int64","bit_settings":"","values":[1,2,3]},{"name":"bar","datatype":"Int64","bit_settings":"","values":[4,5,6]}]}
# I could use pl.read_json() to read it back into a dataframe.
def json_serializable_dataframe(df: pl.DataFrame) -> dict[str, Any]:
"Load serialized dataframe into a serializable dict."
return json.loads(df.write_json())
class Foo(BaseModel, arbitrary_types_allowed=True):
df: pl.DataFrame = pl.DataFrame({"foo":[1,2,3], "bar":[4,5,6]})
Foo().model_dump_json() # how to incorporate my json_serializable_dataframe encoder here?
Is there a way to give pydantic the ability to serialize a custom type?
Upvotes: 4
Views: 2858
Reputation: 21404
Can you use @model_serializer
and manually look for DataFrames?
from pydantic import BaseModel
class Foo(BaseModel, arbitrary_types_allowed=True):
a: pl.DataFrame = pl.DataFrame({"foo":[1], "bar":[2]})
b: pl.DataFrame = pl.DataFrame({"baz":[3], "omg":[4]})
@model_serializer
def serialize(self):
for name, obj in self.__dict__.items():
if isinstance(obj, pl.DataFrame):
self.__dict__[name] = obj.lazy().serialize()
return self.__dict__
Foo().model_dump_json()
'{"a":"{\\"DataFrameScan\\":{\\"df\\":{\\"columns\\":[{\\"name\\":\\"foo\\"...
note: Polars offers frame (de-)serialization via:
Upvotes: 4