Reputation: 83
I am handling a large dataset and I have used Pandas Profilling package. But since the dataset is large, the report is taking too long to generate and browsers are failing to open it. So, I have use "mininmal=True" command, which excludes the correlation matrices and the scatter plots. Is there any way I can generate only the correlation matrices and scatter plots using Pandas Profilling.
from pandas_profiling import ProfileReport
profile = ProfileReport(df, title='EDA_Raw_Data', html={'style':{'full_width':True}},minimal=True)
profile.to_file(output_file="EDA1_Raw_Data.html")
Upvotes: 2
Views: 2483
Reputation: 5708
This is partially possible.
To set the configuration of pandas-profiling to only present scatter plots (or hexbins) and correlation plots, you can start at the minimal configuration:
Then, change the configuration to exclude the computation that you would like to disable (e.g. set samples to zero).
from pandas_profiling import ProfileReport
profile = ProfileReport(df, configuration_file="your_config.yml")
profile.to_file("EDA1_Raw_Data.html")
Note that at this moment, it is not possible to disable all calculations (at v2.6.0). Please make a feature request at the repository for that.
(Disclaimer: Author here. Note that the upcoming v2.7.0 includes significant perfomance improvements, that might also resolve your issue. )
Upvotes: 2