kand
kand

Reputation: 2338

pandas find row+col names for elements with condition

I have a pandas dataframe of correlations. I'm searching through the dataframe to find correlations greater than the threshold 0.5 then return the column name and row name (both are strings) for those elements that match the condition.

Currently, I can transform the correlation matrix to values greater than the threshold, then all other values are NaN using this:

correlations[np.abs(correlations) > 0.5]

Now that I have this matrix of numbers that match my query and NaNs, how do I get the row+col names of each element whose value is not NaN?

Upvotes: 2

Views: 386

Answers (2)

BENY
BENY

Reputation: 323226

Or you can using melt

correlations.where(correlations.abs().gt(0.5)).reset_index().melt('index').dropna()
Out[357]: 
    index variable     value
1       1        A  0.808523
2       2        A  0.885702
4       4        A  0.575710
6       1        B  0.625628
7       2        B  0.759117
10      0        C  0.783643
11      1        C  0.604114
14      4        C  0.686593

Upvotes: 1

jezrael
jezrael

Reputation: 862511

I believe you need boolean indexing of Series created by stack, then rename_axis for new columns names with reset_index for columns from MultiIndex:

np.random.seed(456) 

correlations = pd.DataFrame(np.random.rand(5,3), columns=list('ABC'))
print (correlations)
          A         B         C
0  0.248756  0.163067  0.783643
1  0.808523  0.625628  0.604114
2  0.885702  0.759117  0.181105
3  0.150169  0.435679  0.385273
4  0.575710  0.146091  0.686593

s = correlations.stack()
df = s[np.abs(s) > 0.5].rename_axis(('idx','col')).reset_index(name='val')
print (df)
   idx col       val
0    0   C  0.783643
1    1   A  0.808523
2    1   B  0.625628
3    1   C  0.604114
4    2   A  0.885702
5    2   B  0.759117
6    4   A  0.575710
7    4   C  0.686593

Detail:

print (s)
0  A    0.248756
   B    0.163067
   C    0.783643
1  A    0.808523
   B    0.625628
   C    0.604114
2  A    0.885702
   B    0.759117
   C    0.181105
3  A    0.150169
   B    0.435679
   C    0.385273
4  A    0.575710
   B    0.146091
   C    0.686593
dtype: float64

Upvotes: 3

Related Questions