Reputation: 9919
I have a Pandas DataFrame that is generated from performing multiple correlations across variables.
corr = df.apply(lambda s: df.corrwith(s))
print('\n', 'Correlations')
print(corr.to_string())
The output looks like this:
Correlations
A B C D E
A 1.000000 -0.901104 0.662530 -0.772657 0.532606
B -0.901104 1.000000 -0.380257 0.946223 -0.830466
C 0.662530 -0.380257 1.000000 -0.227531 -0.102506
D -0.772657 0.946223 -0.227531 1.000000 -0.888768
E 0.532606 -0.830466 -0.102506 -0.888768 1.000000
However, this is a small sample of the correlation table, which can be over 300 rows x 300 cols. I'm trying to find a way to identify the coordinates for correlations within a specific value range.
For example, correlations between +0.25 and -0.25. My desired output would be:
E x C = -0.102506
D x C = -0.227531
In searching, I've found a few pandas functions that I'm unable to put together in a coherent way: pandas iloc, loc, pandas between
How would you suggest I go about accomplishing this filtering?
Upvotes: 1
Views: 1073
Reputation: 59579
Use masks + DataFrame.where
. We'll use np.triu
to get rid of duplicates since the correlation matrix is symmetric.
import numpy as np
corr.where(np.triu((corr.values <= 0.25) & (corr.values >= -0.25))).stack()
C D -0.227531
E -0.102506
dtype: float64
Upvotes: 1