OCTAVIAN
OCTAVIAN

Reputation: 336

Speeding up pandas profiling analysis using check_correlation?

Using pandas profiling to generate a report. the size of the dataset is very large to speed up the processing im trying to turn off correlations so i used check_correlations from another post I saw, ValueError: Config parameter "check_correlation" does not exist. is then the issue I get from using this line

a = prof.ProfileReport(df, title='Downloads', check_correlation=False)

which generates this issue of

ValueError: Config parameter "check_correlation" does not exist.

Upvotes: 10

Views: 5995

Answers (4)

petezurich
petezurich

Reputation: 10224

As of version 3.6+ you can do this:

profile = df.profile_report(
    title="Report without correlations",
    correlations={
        "auto": {"calculate": False},
        "pearson": {"calculate": False},
        "spearman": {"calculate": False},
        "kendall": {"calculate": False},
        "phi_k": {"calculate": False},
        "cramers": {"calculate": False},
    },
)

# or using a shorthand that is available for correlations
profile = df.profile_report(
    title="Report without correlations",
    correlations=None,
)

See also the docs here.

Upvotes: 0

Romeu Fronzaroli
Romeu Fronzaroli

Reputation: 129

This way didn't work for me and I used:

a = prof.ProfileReport(df, title='Downloads', minimal=True)

Upvotes: 4

Levent
Levent

Reputation: 56

Since they have changed the configurations on version 2, you could use it as:

import pandas_profiling

profile = df.profile_report(check_correlation_pearson=False,
correlations={'pearson': False,
'spearman': False,
'kendall': False,
'phi_k': False,
'cramers': False,
'recoded': False})

to turn off correlations. However, it is still not as fast as version 1.4. You could also investigate other configurations here.

Upvotes: 4

knagaev
knagaev

Reputation: 2967

Please, see this issue in pandas-profiling project.

Upvotes: 0

Related Questions