Reputation: 3
I am new to the Great Expectations package. I found this tutorial for connecting to a data source, validating the data and visualising the output as a data doc which is saved as an html. https://docs.greatexpectations.io/docs/tutorials/getting_started/tutorial_setup
However I am not able to run the CLI commands used in the tutorial. Is there a way to generate the data docs seen in the tutorial above from a series of expectation results ran on an in-memory pandas dataframe.
This article goes through how to perform the expectation results on a read-in pandas dataframe, and for each expectation outputs a result dictionary, however it does not explain how to take the results and convert them into a data docs. https://towardsdatascience.com/a-great-python-library-great-expectations-6ac6d6fe822e
Minimal Reproducible Example
Python==3.8.15
Packages:
great-expectations==0.15.41
pandas==1.5.2
import pandas as pd
import great_expectations as gx
# simple dataframe
df = pd.DataFrame({'A': [1, 2, 3, 4, 5],
'B': ['a','b','c','d','e']})
# Turn into GX dataframe
df = gx.from_pandas(df)
df.head()
[enter image description here](https://i.sstatic.net/5IC9R.png)
gx_result = df.expect_column_to_exist("A")
print(gx_result)
[enter image description here](https://i.sstatic.net/yF3tS.png)
# Code to convert expectation result into data doc
I have also found this piece of documentation that refers to creating a data doc, but am unsure how to connect it with the code above. https://docs.greatexpectations.io/docs/terms/data_docs/
Thanks in advance
Upvotes: 0
Views: 977
Reputation: 737
Hi James following are steps to achieve what you are looking for using programmatic way.
Connect to runtime pandas using python. Check for no cli + no filesystem tab. https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/in_memory/pandas/
Create Checkpoint. Use Python section. Refer section 5. validate data. Change from sparkdf to pandasdf wherever applicable. https://docs.greatexpectations.io/docs/deployment_patterns/how_to_use_great_expectations_in_emr_serverless.
You need combine code w.r.t your context to achieve what you want.
Hope it helps.
Upvotes: 0