Reputation: 2338
I have a pandas dataframe of correlations. I'm searching through the dataframe to find correlations greater than the threshold 0.5
then return the column name and row name (both are strings) for those elements that match the condition.
Currently, I can transform the correlation matrix to values greater than the threshold, then all other values are NaN
using this:
correlations[np.abs(correlations) > 0.5]
Now that I have this matrix of numbers that match my query and NaN
s, how do I get the row+col names of each element whose value is not NaN
?
Upvotes: 2
Views: 386
Reputation: 323226
Or you can using melt
correlations.where(correlations.abs().gt(0.5)).reset_index().melt('index').dropna()
Out[357]:
index variable value
1 1 A 0.808523
2 2 A 0.885702
4 4 A 0.575710
6 1 B 0.625628
7 2 B 0.759117
10 0 C 0.783643
11 1 C 0.604114
14 4 C 0.686593
Upvotes: 1
Reputation: 862511
I believe you need boolean indexing
of Series
created by stack
, then rename_axis
for new columns names with reset_index
for columns from MultiIndex
:
np.random.seed(456)
correlations = pd.DataFrame(np.random.rand(5,3), columns=list('ABC'))
print (correlations)
A B C
0 0.248756 0.163067 0.783643
1 0.808523 0.625628 0.604114
2 0.885702 0.759117 0.181105
3 0.150169 0.435679 0.385273
4 0.575710 0.146091 0.686593
s = correlations.stack()
df = s[np.abs(s) > 0.5].rename_axis(('idx','col')).reset_index(name='val')
print (df)
idx col val
0 0 C 0.783643
1 1 A 0.808523
2 1 B 0.625628
3 1 C 0.604114
4 2 A 0.885702
5 2 B 0.759117
6 4 A 0.575710
7 4 C 0.686593
Detail:
print (s)
0 A 0.248756
B 0.163067
C 0.783643
1 A 0.808523
B 0.625628
C 0.604114
2 A 0.885702
B 0.759117
C 0.181105
3 A 0.150169
B 0.435679
C 0.385273
4 A 0.575710
B 0.146091
C 0.686593
dtype: float64
Upvotes: 3