Brian
Brian

Reputation: 1645

correlation matrix of one dataframe with another

I was reading through the answers to this question. Then question came up on how to calculate the correlations of all columns from one dataframe with all columns from the other dataframe. Since it seemed this question wasn't going to get answered, I wanted to ask it as I need something just like that.

So say I have dataframes A and B:

import pandas as pd
import numpy as np

A = pd.DataFrame(np.random.rand(24, 5), columns=list('abcde'))
B = pd.DataFrame(np.random.rand(24, 5), columns=list('ABCDE'))

how do I get a dataframe that looks like this:

pd.DataFrame([], A.columns, B.columns)

     A    B    C    D    E
a  NaN  NaN  NaN  NaN  NaN
b  NaN  NaN  NaN  NaN  NaN
c  NaN  NaN  NaN  NaN  NaN
d  NaN  NaN  NaN  NaN  NaN
e  NaN  NaN  NaN  NaN  NaN

But filled with the appropriate correlations?

Upvotes: 5

Views: 1859

Answers (1)

piRSquared
piRSquared

Reputation: 294258

One way to do it would be:

pd.concat([A, B], axis=1).corr().filter(B.columns).filter(A.columns, axis=0)

enter image description here

A more efficient way would be:

Az = (A - A.mean())
Bz = (B - B.mean())

Az.T.dot(Bz).div(len(A)).div(Bz.std(ddof=0)).div(Az.std(ddof=0), axis=0)

And you'd get the same as above.

Upvotes: 5

Related Questions