How to define a Pandera DataFrame schema for validating and parsing datetime columns?

Question

I have a csv that contains datetime columns and I want to use Pandera to validate the columns and parse them to the correct format. An example value in the column would be: 2023-02-04T00:39:00+00:00.

This is currently parsed in pandas to the right format using the following python code:

column = pd.to_datetime(column, format="%Y-%m-%dT%H:%M:%S")
column = column.dt.tz_convert("Europe/Amsterdam")

I would want to define a pandera DataFrame schema such that parsing is handled "automatically" when I read the csv with the following code:

schema = DataFrameSchema(
    {
        "datetime_column": Column()  # how to implement the above here??
    },
    strict=True,
    coerce=False,
)

df = pd.read_csv(src, dtype={col: str(dtype) for col, dtype in schema.dtypes.items()})
schema.validate(df)

I already use the above approach for simple types like string, ints, etc. But how would I do this for DateTime types (usually tz-aware)?

There is not a lot of documentation so I couldn't figure it out from the online documentation so far.

How to define a Pandera DataFrame schema for validating and parsing datetime columns?

Answers (1)

Related Questions