Tom McLean
Tom McLean

Reputation: 6359

How can I decode a JSON string into a pydantic model with a dataframe field?

I am using MongoDB to store the results of a script into a database. When I want to reload the data back into python, I need to decode the JSON (or BSON) string into a pydantic basemodel. With a pydantic model with JSON compatible types, I can just do:

base_model = BaseModelClass.parse_raw(string)

But the default json.loads decoder doesn't know how to deal with a DataFrame. I can overwrite the .parse_raw function into something like:

from pydantic import BaseModel
import pandas as pd

class BaseModelClass(BaseModel):
    df: pd.DataFrame
    
    class Config:
        arbitrary_types_allowed = True
        json_encoders = {
            pd.DataFrame: lambda df: df.to_json()
        }

    @classmethod
    def parse_raw(cls, data):
        data = json.loads(data)
        data['df'] = pd.read_json(data['df'])
        return cls(**data)

But ideally I would want to automatically decode fields of type pd.DataFrame rather than manually change the parse_raw function every time. Is there any way of doing something like:

    class Config:
        arbitrary_types_allowed = True
        json_encoders = {
            pd.DataFrame: lambda df: df.to_json()
        }
        json_decoders = {
            pd.DataFrame: lambda df: pd.read_json(data['df'])
        }

To make the detection of any field which should be a data frame, be converted to one, without having to modify the parse_raw() script?

Upvotes: 8

Views: 8879

Answers (1)

Yaakov Bressler
Yaakov Bressler

Reputation: 12088

Pydantic V2:

You can define a custom data type and specify a serializer which will automatically handle conversions:

from typing import Annotated, Any

from pydantic import BaseModel, GetCoreSchemaHandler
import pandas as pd

from pydantic_core import CoreSchema, core_schema


class myDataFrame(pd.DataFrame):

    @classmethod
    def __get_pydantic_core_schema__(
            cls, source_type: Any, handler: GetCoreSchemaHandler
    ) -> CoreSchema:

        validate = core_schema.no_info_plain_validator_function(cls.try_parse_to_df)

        return core_schema.json_or_python_schema(
            json_schema=validate,
            python_schema=validate,
            serialization=core_schema.plain_serializer_function_ser_schema(
                lambda df: df.to_json()
            ),
        )

    @classmethod
    def try_parse_to_df(cls, value: Any):
        if isinstance(value, str):
            return pd.read_json(value)
        return value


# Create a model with your custom type
class BaseModelClass(BaseModel):
    df: myDataFrame


# Create your model
sample_df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], columns=["A", "B"])
my_model = BaseModelClass(df=sample_df)

# Should also be able to parse from json
my_model = BaseModelClass(df=sample_df.to_json())

# Even more dramatically
my_model_2 = BaseModelClass.model_validate_json(my_model.model_dump_json())

Upvotes: 2

Related Questions