Reputation: 6359
I am using MongoDB to store the results of a script into a database. When I want to reload the data back into python, I need to decode the JSON (or BSON) string into a pydantic basemodel. With a pydantic model with JSON compatible types, I can just do:
base_model = BaseModelClass.parse_raw(string)
But the default json.loads
decoder doesn't know how to deal with a DataFrame. I can overwrite the .parse_raw
function into something like:
from pydantic import BaseModel
import pandas as pd
class BaseModelClass(BaseModel):
df: pd.DataFrame
class Config:
arbitrary_types_allowed = True
json_encoders = {
pd.DataFrame: lambda df: df.to_json()
}
@classmethod
def parse_raw(cls, data):
data = json.loads(data)
data['df'] = pd.read_json(data['df'])
return cls(**data)
But ideally I would want to automatically decode fields of type pd.DataFrame
rather than manually change the parse_raw
function every time. Is there any way of doing something like:
class Config:
arbitrary_types_allowed = True
json_encoders = {
pd.DataFrame: lambda df: df.to_json()
}
json_decoders = {
pd.DataFrame: lambda df: pd.read_json(data['df'])
}
To make the detection of any field which should be a data frame, be converted to one, without having to modify the parse_raw() script?
Upvotes: 8
Views: 8879
Reputation: 12088
You can define a custom data type and specify a serializer which will automatically handle conversions:
from typing import Annotated, Any
from pydantic import BaseModel, GetCoreSchemaHandler
import pandas as pd
from pydantic_core import CoreSchema, core_schema
class myDataFrame(pd.DataFrame):
@classmethod
def __get_pydantic_core_schema__(
cls, source_type: Any, handler: GetCoreSchemaHandler
) -> CoreSchema:
validate = core_schema.no_info_plain_validator_function(cls.try_parse_to_df)
return core_schema.json_or_python_schema(
json_schema=validate,
python_schema=validate,
serialization=core_schema.plain_serializer_function_ser_schema(
lambda df: df.to_json()
),
)
@classmethod
def try_parse_to_df(cls, value: Any):
if isinstance(value, str):
return pd.read_json(value)
return value
# Create a model with your custom type
class BaseModelClass(BaseModel):
df: myDataFrame
# Create your model
sample_df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], columns=["A", "B"])
my_model = BaseModelClass(df=sample_df)
# Should also be able to parse from json
my_model = BaseModelClass(df=sample_df.to_json())
# Even more dramatically
my_model_2 = BaseModelClass.model_validate_json(my_model.model_dump_json())
Upvotes: 2