Nilo Araujo
Nilo Araujo

Reputation: 815

How to validate a dataframe index using SchemaModel in Pandera

I can validate a DataFrame index using the DataFrameSchema like this:

import pandera as pa

from pandera import Column, DataFrameSchema, Check, Index

schema = DataFrameSchema(
    columns={
        "column1": pa.Column(int),
    },
    index=pa.Index(int, name="index_name"),
)
# raises the error as expected
schema.validate(
    pd.DataFrame({"column1": [1, 2, 3]}, index=pd.Index([1, 2, 3], name="index_incorrect_name")) 
)

Is there a way to do the same using a SchemaModel?

Upvotes: 1

Views: 2247

Answers (3)

Arigion
Arigion

Reputation: 3548

As of pandera 0.14.0 SchemaModel is simply an alias of DataFrameModel. SchemaModel will continue to work as a valid way of specifying types for DataFrame models for the foreseeable future, and will be deprecated in version 0.20.0.

Source: Pandera Documentation

Upvotes: 0

Prashant
Prashant

Reputation: 155

You can do as follows -

import pandera as pa
from pandera.typing import Index, Series

class Schema(pa.SchemaModel):
    idx: Index[int] = pa.Field(ge=0, check_name=True)
    column1: Series[int]

df = pd.DataFrame({"column1": [1, 2, 3]}, index=pd.Index([1, 2, 3], name="index_incorrect_name")) 

Schema.validate(df)

Upvotes: 0

Nilo Araujo
Nilo Araujo

Reputation: 815

Found an answer in GitHub

You can use pa.typing.Index to type-annotate an index.

class Schema(pa.SchemaModel):
    column1: pa.typing.Series[int]
    index_name: pa.typing.Index[int] = pa.Field(check_name=True)

See how you can validate a MultiIndex index: https://pandera.readthedocs.io/en/stable/schema_models.html#multiindex

Upvotes: 1

Related Questions