Natasha
Natasha

Reputation: 1521

Computing correlation of a matrix with its transpose

I am trying to compute the correlation of a matrix(here, rows of dataframe) with its transpose using apply.

The following is the code:

import pandas as pd
from pprint import pprint
d = {'A': [1,0,3,0], 'B':[2,0,1,0], 'C':[0,0,8,0], 'D':[1,0,0,1]}
df = pd.DataFrame(data=d)
df_T = df.T  
corr = df.apply(lambda s: df_T.corrwith(s))

All the columns of correlation variable contains NaN entries. I'd like to understand why NaN occurs.

Could someone explain?

Upvotes: 1

Views: 582

Answers (1)

jezrael
jezrael

Reputation: 862581

I think you need DataFrame.corr:

print (df.corr())
          A         B         C         D
A  1.000000  0.492366  0.942809 -0.408248
B  0.492366  1.000000  0.174078  0.301511
C  0.942809  0.174078  1.000000 -0.577350
D -0.408248  0.301511 -0.577350  1.000000

If need your solution is necessary same index and columns values:

df = pd.DataFrame(data=d).set_index(df.columns)
print (df)
   A  B  C  D
A  1  2  0  1
B  0  0  0  0
C  3  1  8  0
D  0  0  0  1

df_T = df.T  

corr = df.apply(lambda s: df_T.corrwith(s))
print (corr)
          A         B         C         D
A -0.866025 -0.426401 -0.816497  0.000000
B       NaN       NaN       NaN       NaN
C  0.993399  0.489116  0.936586 -0.486664
D -0.471405 -0.522233 -0.333333  0.577350

Upvotes: 1

Related Questions