Walter Kelt
Walter Kelt

Reputation: 2279

Pandera: Is cell based dataframe data validation possible?

Every row of my dataframe contain a record with a unique key combination. The data validation will be based on the columns and on key combination. For example, in a single column, cells may have a different min/max requirement based on the key combination.

Several questions:

  1. can Pandera validate on a cell basis as opposed to column basis ?
  2. does Pandera have a schema generator capable of this type of flexibility. Perhaps it scans a "golden dataframe" as a starting place to create a schema based on some provided criteria. I realize the schema generator output may need a bit of tweaking.

The library does look cool, and I am interested to pursue further.

thanks

Upvotes: 0

Views: 700

Answers (1)

cosmicBboy
cosmicBboy

Reputation: 169

so you can create a validator that validates a single value at a time with the element_size=True kwarg, you can read more here.

import pandera as pa

check = pa.Check(lambda x: 0 <= x <= 100, element_wise=True)

The function must take an individual value as input and output a boolean.

Can you elaborate on the exact check that you want to perform? If you want to do a dataframe-level row-wise check you can use an element-wise check at the dataframe-level as a wide check.

does Pandera have a schema generator capable of this type of flexibility. Perhaps it scans a "golden dataframe" as a starting place to create a schema based on some provided criteria. I realize the schema generator output may need a bit of tweaking.

You can use the schema = pandera.infer_schema(golden_dataframe) function to bootstrap a starter schema, then write it out to a file with schema.to_script("path/to/file") to further iterate.

Upvotes: 2

Related Questions