Reputation: 165
I have two DF with a structure like that:
df1 = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(8, 6), columns=['T', 'U', 'V', 'X','Y','Z'])
I would like to test the correlation ('pearson') between every single column of DF1 with every single column of DF2. Then combine all the results into one correlation matrix.
A similar question has been asked in the past but my DF1 has several columns:
Correlation between two dataframes
Any help on how to do this will be great.
Upvotes: 0
Views: 138
Reputation: 22979
Compute it directly:
# center and standardize
df1vals = (df1.values - df1.values.mean(axis=0)) / df1.values.std(axis=0)
df2vals = (df2.values - df2.values.mean(axis=0)) / df2.values.std(axis=0)
# compute correlation
pearsons = df1vals.T.dot(df2vals) / len(df1)
This has shape (len(df1), len(df2))
If you really need to use corrwith
, then:
pd.concat([
df1.corrwith(df2[c]) for c in df2
], axis=1, keys=df2.columns)
Upvotes: 1