Andy
Andy

Reputation: 41

Validate pandas.Period with Pandera schema

I have a dataframe with a column "Period" which should have a dtype of pandas.Period.

I would like to validate this, using a Pandera Schema (either DataFrameModel or DataFrameSchema).

My attempts so far return errors.

If I try the code below, then I get an error

Data type '<class 'pandas._libs.tslibs.period.Period'>' not understood by Engine.

Code:


import pandas as pd
import pandera as pa
from pandera.typing import Series


class Schema(pa.DataFrameModel):
   period: Series[pd.Period]


df = pd.DataFrame({"period" : pd.period_range("31/01/2024", "31/12/2024", freq='M')})

Schema.validate(df)

Any advice is much appreciated!

Upvotes: 1

Views: 167

Answers (2)

e-motta
e-motta

Reputation: 7530

  1. You need to use PeriodDtype;
  2. PeriodDtype takes a parameter freq, that needs to be specified using typing.Annotated:
from typing import Annotated

class Schema(pa.DataFrameModel):
    period: Series[Annotated[pd.PeriodDtype, "M"]]  # <= change this

df = pd.DataFrame({"period": pd.period_range("31/01/2024", "31/12/2024", freq="M")})

You can read more about it here.

Upvotes: 1

Wael Jlassi
Wael Jlassi

Reputation: 133

can you check this solution

import pandas as pd
import pandera as pa
from pandera import Column, Check
def is_period(series): return series.apply(lambda x: isinstance(x, pd.Period)).all()
schema = pa.DataFrameSchema({ "period": Column(object, checks=Check(is_period)), })
df = pd.DataFrame({"period": pd.period_range("31/01/2024", "31/12/2024", freq='M')})
schema.validate(df)

Upvotes: 0

Related Questions