Reputation: 41
I have a dataframe with a column "Period" which should have a dtype of pandas.Period.
I would like to validate this, using a Pandera Schema (either DataFrameModel or DataFrameSchema).
My attempts so far return errors.
If I try the code below, then I get an error
Data type '<class 'pandas._libs.tslibs.period.Period'>' not understood by Engine.
Code:
import pandas as pd
import pandera as pa
from pandera.typing import Series
class Schema(pa.DataFrameModel):
period: Series[pd.Period]
df = pd.DataFrame({"period" : pd.period_range("31/01/2024", "31/12/2024", freq='M')})
Schema.validate(df)
Any advice is much appreciated!
Upvotes: 1
Views: 167
Reputation: 7530
PeriodDtype
;PeriodDtype
takes a parameter freq
, that needs to be specified using typing.Annotated
:from typing import Annotated
class Schema(pa.DataFrameModel):
period: Series[Annotated[pd.PeriodDtype, "M"]] # <= change this
df = pd.DataFrame({"period": pd.period_range("31/01/2024", "31/12/2024", freq="M")})
You can read more about it here.
Upvotes: 1
Reputation: 133
can you check this solution
import pandas as pd
import pandera as pa
from pandera import Column, Check
def is_period(series): return series.apply(lambda x: isinstance(x, pd.Period)).all()
schema = pa.DataFrameSchema({ "period": Column(object, checks=Check(is_period)), })
df = pd.DataFrame({"period": pd.period_range("31/01/2024", "31/12/2024", freq='M')})
schema.validate(df)
Upvotes: 0