Reputation: 1521
I am trying to compute the correlation of a matrix(here, rows of dataframe) with its transpose using apply
.
The following is the code:
import pandas as pd
from pprint import pprint
d = {'A': [1,0,3,0], 'B':[2,0,1,0], 'C':[0,0,8,0], 'D':[1,0,0,1]}
df = pd.DataFrame(data=d)
df_T = df.T
corr = df.apply(lambda s: df_T.corrwith(s))
All the columns of correlation variable contains NaN entries. I'd like to understand why NaN occurs.
Could someone explain?
Upvotes: 1
Views: 582
Reputation: 862581
I think you need DataFrame.corr
:
print (df.corr())
A B C D
A 1.000000 0.492366 0.942809 -0.408248
B 0.492366 1.000000 0.174078 0.301511
C 0.942809 0.174078 1.000000 -0.577350
D -0.408248 0.301511 -0.577350 1.000000
If need your solution is necessary same index and columns values:
df = pd.DataFrame(data=d).set_index(df.columns)
print (df)
A B C D
A 1 2 0 1
B 0 0 0 0
C 3 1 8 0
D 0 0 0 1
df_T = df.T
corr = df.apply(lambda s: df_T.corrwith(s))
print (corr)
A B C D
A -0.866025 -0.426401 -0.816497 0.000000
B NaN NaN NaN NaN
C 0.993399 0.489116 0.936586 -0.486664
D -0.471405 -0.522233 -0.333333 0.577350
Upvotes: 1