James Challis
James Challis

Reputation: 3

Great Expectations – Generating Data Doc Without CLI on In-Memory Pandas Dataframe

I am new to the Great Expectations package. I found this tutorial for connecting to a data source, validating the data and visualising the output as a data doc which is saved as an html. https://docs.greatexpectations.io/docs/tutorials/getting_started/tutorial_setup

However I am not able to run the CLI commands used in the tutorial. Is there a way to generate the data docs seen in the tutorial above from a series of expectation results ran on an in-memory pandas dataframe.

This article goes through how to perform the expectation results on a read-in pandas dataframe, and for each expectation outputs a result dictionary, however it does not explain how to take the results and convert them into a data docs. https://towardsdatascience.com/a-great-python-library-great-expectations-6ac6d6fe822e

Minimal Reproducible Example
Python==3.8.15
Packages: 
great-expectations==0.15.41
pandas==1.5.2

import pandas as pd
import great_expectations as gx

# simple dataframe
df = pd.DataFrame({'A': [1, 2, 3, 4, 5],
                   'B': ['a','b','c','d','e']})

# Turn into GX dataframe
df = gx.from_pandas(df)

df.head()
 [enter image description here](https://i.sstatic.net/5IC9R.png)

gx_result = df.expect_column_to_exist("A")

print(gx_result)
 [enter image description here](https://i.sstatic.net/yF3tS.png)

# Code to convert expectation result into data doc

I have also found this piece of documentation that refers to creating a data doc, but am unsure how to connect it with the code above. https://docs.greatexpectations.io/docs/terms/data_docs/ 

Thanks in advance

Upvotes: 0

Views: 977

Answers (1)

Sarang Shinde
Sarang Shinde

Reputation: 737

Hi James following are steps to achieve what you are looking for using programmatic way.

  1. Connect to runtime pandas using python. Check for no cli + no filesystem tab. https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/in_memory/pandas/

  2. Create Checkpoint. Use Python section. Refer section 5. validate data. Change from sparkdf to pandasdf wherever applicable. https://docs.greatexpectations.io/docs/deployment_patterns/how_to_use_great_expectations_in_emr_serverless.

You need combine code w.r.t your context to achieve what you want.

Hope it helps.

Upvotes: 0

Related Questions