Reputation:
I have a set of columns (col1,col2,col3) in dataframe df1 I have another set of columns (col4,col5,col6) in dataframe df2 Assume this two dataframes has the same number of rows.
How do I generate a correlation table that do pairwise correlation between df1 and df2?
the table will look like
col1 col2 col3
col4 .. .. ..
col5 .. .. ..
col6 .. .. ..
I use df1.corrwith(df2)
, it does not seem to generate the table as required.
I have seen the answer at How to check correlation between matching columns of two data sets?, but the main difference is that the col names does not matched.
Upvotes: 9
Views: 12612
Reputation: 294228
pandas
quick and dirty
pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df2', 'df1']
numpy
clean
def corr(df1, df2):
n = len(df1)
v1, v2 = df1.values, df2.values
sums = np.multiply.outer(v2.sum(0), v1.sum(0))
stds = np.multiply.outer(v2.std(0), v1.std(0))
return pd.DataFrame((v2.T.dot(v1) - sums / n) / stds / n,
df2.columns, df1.columns)
corr(df1, df2)
example
df1 = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd'))
df2 = pd.DataFrame(np.random.rand(10, 3), columns=list('xyz'))
pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df2', 'df1']
a b c d
x 0.235624 0.844665 -0.647962 0.535562
y 0.357994 0.462007 0.205863 0.424568
z 0.688853 0.350318 0.132357 0.687038
corr(df1, df2)
a b c d
x 0.235624 0.844665 -0.647962 0.535562
y 0.357994 0.462007 0.205863 0.424568
z 0.688853 0.350318 0.132357 0.687038
Upvotes: 25