Reputation: 3630
I'm trying to find highest correlations for different columns with pandas. I know can get correlation matrix with
df.corr()
I know I can get the highest correlations after that with
df.sort()
df.stack()
df[-5:]
The problem is that these correlation also contain values for column with the column itself (1). How do I remove these columns that contain correlation with self? I know I can remove them by removing all 1 values but I don't want to do that as there might be actual 1 correlations too.
Upvotes: 18
Views: 14068
Reputation: 1070
another solution would be a stack.
s = corr.stack(-1)
# remove where corr is 1
s = s[s != 1]
# convert to matrix again
s.unstack()
or
corr.values[np.tril_indices_from(corr.values, k=0)] = np.nan
Upvotes: 0
Reputation: 5982
Fill them with NaN rather than a fake number
import numpy as np
np.fill_diagonal(corr_matrix.values, np.nan) # automatically inplace
NaN is supported by both seaborn and plotly correlation matrices
Upvotes: 2
Reputation: 3630
I recently found even cleaner answer to my question, you can compare multi-index levels by value.
This is what I ended using.
corr = df.corr().stack()
corr = corr[corr.index.get_level_values(0) != corr.index.get_level_values(1)]
Upvotes: 11
Reputation: 76297
Say you have
corrs = df.corr()
Then the problem is with the diagonal elements, IIUC. You can easily set them to some negative value, say -2 (which will necessarily be lower than all correlations) with
np.fill_diagonal(corrs.values, -2)
Example
(Many thanks to @Fabian Rost for the improvement & @jezrael for the DataFrame)
import numpy as np
df=pd.DataFrame( {
'one':[0.1, .32, .2, 0.4, 0.8],
'two':[.23, .18, .56, .61, .12],
'three':[.9, .3, .6, .5, .3],
'four':[.34, .75, .91, .19, .21],
'zive': [0.1, .32, .2, 0.4, 0.8],
'six':[.9, .3, .6, .5, .3],
'drive':[.9, .3, .6, .5, .3]})
corrs = df.corr()
np.fill_diagonal(corrs.values, -2)
>>> corrs
drive four one six three two zive
drive -2.000000 -0.039607 -0.747365 1.000000 1.000000 0.238102 -0.747365
four -0.039607 -2.000000 -0.489177 -0.039607 -0.039607 0.159583 -0.489177
one -0.747365 -0.489177 -2.000000 -0.747365 -0.747365 -0.351531 1.000000
six 1.000000 -0.039607 -0.747365 -2.000000 1.000000 0.238102 -0.747365
three 1.000000 -0.039607 -0.747365 1.000000 -2.000000 0.238102 -0.747365
two 0.238102 0.159583 -0.351531 0.238102 0.238102 -2.000000 -0.351531
zive -0.747365 -0.489177 1.000000 -0.747365 -0.747365 -0.351531 -2.000000
Upvotes: 13