pepe
pepe

Reputation: 9919

Pandas: how to find row and column for values in a range?

I have a Pandas DataFrame that is generated from performing multiple correlations across variables.

corr = df.apply(lambda s: df.corrwith(s))
print('\n', 'Correlations')
print(corr.to_string())

The output looks like this:

 Correlations
        A         B           C          D          E
A   1.000000   -0.901104    0.662530  -0.772657   0.532606
B  -0.901104    1.000000   -0.380257   0.946223  -0.830466
C   0.662530   -0.380257    1.000000  -0.227531  -0.102506
D  -0.772657    0.946223   -0.227531   1.000000  -0.888768
E   0.532606   -0.830466   -0.102506  -0.888768   1.000000

However, this is a small sample of the correlation table, which can be over 300 rows x 300 cols. I'm trying to find a way to identify the coordinates for correlations within a specific value range.

For example, correlations between +0.25 and -0.25. My desired output would be:

E x C = -0.102506
D x C = -0.227531

In searching, I've found a few pandas functions that I'm unable to put together in a coherent way: pandas iloc, loc, pandas between

How would you suggest I go about accomplishing this filtering?

Upvotes: 1

Views: 1073

Answers (1)

ALollz
ALollz

Reputation: 59579

Use masks + DataFrame.where. We'll use np.triu to get rid of duplicates since the correlation matrix is symmetric.

import numpy as np

corr.where(np.triu((corr.values <= 0.25) & (corr.values >= -0.25))).stack()

C  D   -0.227531
   E   -0.102506
dtype: float64

Upvotes: 1

Related Questions