Reputation: 31
I really like the heatmap, but what I need are the numbers behind the heatmap (AKA correlation matrix).
Is there an easy way to extract the numbers?
Upvotes: 2
Views: 1961
Reputation: 3512
It was a bit hard to track down but starting from the documentation; specifically
from the report structure then digging into the following function get_correlation_items(summary) and then going into the source and looking at the usage of it we get to this call that essentially loops over each of the correlation types in the summary, to obtain the summary object we can find the following, if we lookup the caller we find that it is get_report_structure(summary) and if we try to find how to get the summary
arg we find that it is simply the description_set
property as shown here.
Given the above, we can now do the following using version 2.9.0:
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport
df = pd.DataFrame(
np.random.rand(100, 5),
columns=["a", "b", "c", "d", "e"]
)
profile = ProfileReport(df, title="StackOverflow", explorative=True)
correlations = profile.description_set["correlations"]
print(correlations.keys())
dict_keys(['pearson', 'spearman', 'kendall', 'phi_k'])
To see a specific correlation do:
correlations["phi_k"]["e"]
a 0.000000
b 0.112446
c 0.289983
d 0.000000
e 1.000000
Name: e, dtype: float64
Upvotes: 3