Reputation: 1645
I was reading through the answers to this question. Then question came up on how to calculate the correlations of all columns from one dataframe with all columns from the other dataframe. Since it seemed this question wasn't going to get answered, I wanted to ask it as I need something just like that.
So say I have dataframes A
and B
:
import pandas as pd
import numpy as np
A = pd.DataFrame(np.random.rand(24, 5), columns=list('abcde'))
B = pd.DataFrame(np.random.rand(24, 5), columns=list('ABCDE'))
how do I get a dataframe that looks like this:
pd.DataFrame([], A.columns, B.columns)
A B C D E
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN
But filled with the appropriate correlations?
Upvotes: 5
Views: 1859
Reputation: 294258
One way to do it would be:
pd.concat([A, B], axis=1).corr().filter(B.columns).filter(A.columns, axis=0)
A more efficient way would be:
Az = (A - A.mean())
Bz = (B - B.mean())
Az.T.dot(Bz).div(len(A)).div(Bz.std(ddof=0)).div(Az.std(ddof=0), axis=0)
And you'd get the same as above.
Upvotes: 5