pandas find row+col names for elements with condition

Question

I have a pandas dataframe of correlations. I'm searching through the dataframe to find correlations greater than the threshold 0.5 then return the column name and row name (both are strings) for those elements that match the condition.

Currently, I can transform the correlation matrix to values greater than the threshold, then all other values are NaN using this:

correlations[np.abs(correlations) > 0.5]

Now that I have this matrix of numbers that match my query and NaNs, how do I get the row+col names of each element whose value is not NaN?

jezrael · Accepted Answer

I believe you need boolean indexing of Series created by stack, then rename_axis for new columns names with reset_index for columns from MultiIndex:

np.random.seed(456) 

correlations = pd.DataFrame(np.random.rand(5,3), columns=list('ABC'))
print (correlations)
          A         B         C
0  0.248756  0.163067  0.783643
1  0.808523  0.625628  0.604114
2  0.885702  0.759117  0.181105
3  0.150169  0.435679  0.385273
4  0.575710  0.146091  0.686593

s = correlations.stack()
df = s[np.abs(s) > 0.5].rename_axis(('idx','col')).reset_index(name='val')
print (df)
   idx col       val
0    0   C  0.783643
1    1   A  0.808523
2    1   B  0.625628
3    1   C  0.604114
4    2   A  0.885702
5    2   B  0.759117
6    4   A  0.575710
7    4   C  0.686593

Detail:

print (s)
0  A    0.248756
   B    0.163067
   C    0.783643
1  A    0.808523
   B    0.625628
   C    0.604114
2  A    0.885702
   B    0.759117
   C    0.181105
3  A    0.150169
   B    0.435679
   C    0.385273
4  A    0.575710
   B    0.146091
   C    0.686593
dtype: float64

pandas find row+col names for elements with condition

Answers (2)

Related Questions