CHW
CHW

Reputation: 31

How can I get the numbers for the correlation matrix from Pandas Profiling

I really like the heatmap, but what I need are the numbers behind the heatmap (AKA correlation matrix). Heatmap example

Is there an easy way to extract the numbers?

Upvotes: 2

Views: 1961

Answers (1)

Tristian
Tristian

Reputation: 3512

It was a bit hard to track down but starting from the documentation; specifically from the report structure then digging into the following function get_correlation_items(summary) and then going into the source and looking at the usage of it we get to this call that essentially loops over each of the correlation types in the summary, to obtain the summary object we can find the following, if we lookup the caller we find that it is get_report_structure(summary) and if we try to find how to get the summary arg we find that it is simply the description_set property as shown here.

Given the above, we can now do the following using version 2.9.0:

import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(
    np.random.rand(100, 5),
    columns=["a", "b", "c", "d", "e"]
)

profile = ProfileReport(df, title="StackOverflow", explorative=True)

correlations = profile.description_set["correlations"]
print(correlations.keys())
dict_keys(['pearson', 'spearman', 'kendall', 'phi_k'])

To see a specific correlation do:

correlations["phi_k"]["e"]
a    0.000000
b    0.112446
c    0.289983
d    0.000000
e    1.000000
Name: e, dtype: float64

Upvotes: 3

Related Questions