Moinak Dey
Moinak Dey

Reputation: 83

How to only generate the correlations and scatter plots using Pandas Profilling package?

I am handling a large dataset and I have used Pandas Profilling package. But since the dataset is large, the report is taking too long to generate and browsers are failing to open it. So, I have use "mininmal=True" command, which excludes the correlation matrices and the scatter plots. Is there any way I can generate only the correlation matrices and scatter plots using Pandas Profilling.

from pandas_profiling import ProfileReport
profile = ProfileReport(df, title='EDA_Raw_Data', html={'style':{'full_width':True}},minimal=True)
profile.to_file(output_file="EDA1_Raw_Data.html")

Upvotes: 2

Views: 2483

Answers (1)

Simon
Simon

Reputation: 5708

This is partially possible.

To set the configuration of pandas-profiling to only present scatter plots (or hexbins) and correlation plots, you can start at the minimal configuration:

https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_minimal.yaml

Then, change the configuration to exclude the computation that you would like to disable (e.g. set samples to zero).

from pandas_profiling import ProfileReport
profile = ProfileReport(df, configuration_file="your_config.yml")
profile.to_file("EDA1_Raw_Data.html")

Note that at this moment, it is not possible to disable all calculations (at v2.6.0). Please make a feature request at the repository for that.

(Disclaimer: Author here. Note that the upcoming v2.7.0 includes significant perfomance improvements, that might also resolve your issue. )

Upvotes: 2

Related Questions