thenac
thenac

Reputation: 305

How to filter a pandas DataFrame and keep specific elements?

I have a pandas Data Frame which is a 50x50 correlation matrix. In the following picture you can see what I have as an example

enter image description here

What I would like to do, if it's possible of course, is to make a new data frame which has only the elements of the old one that are higher than 0.5 or lower than -0.5, indicating a strong linear relationship, but not 1, to avoid the variance parts.

I dont think what I ask is exactly possible because of course variable x0 wont have the same strong relationships that x1 have etc, so the new data frame wont be looking very good.

But is there any way to scan fast through this data frame, find the values I mentioned and maybe at least insert them into an array?

Any insight would be helpful. Thanks

Upvotes: 1

Views: 342

Answers (1)

Steven G
Steven G

Reputation: 17122

you can't really look at a correlation matrix if you want to drop correlation pairs that are too low. One thing you could do is stack the frame and keep the relevant correlation pair.

having (randomly generated as an example):

          0         1         2         3         4
0  0.038142 -0.881054 -0.718265 -0.037968 -0.587288
1  0.587694 -0.135326 -0.529463 -0.508112 -0.160751
2 -0.528640 -0.434885 -0.679416 -0.455866  0.077580
3  0.158409  0.827085  0.018871 -0.478428  0.129545
4  0.825489 -0.000416  0.682744  0.794137  0.694887

you could do:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.uniform(-1, 1, (5, 5)))
df = df.stack()
df = df[((df > 0.5) | (df < -0.5)) & (df != 1)]


0  1   -0.881054
   2   -0.718265
   4   -0.587288
1  0    0.587694
   2   -0.529463
   3   -0.508112
2  0   -0.528640
   2   -0.679416
3  1    0.827085
4  0    0.825489
   2    0.682744
   3    0.794137
   4    0.694887

Upvotes: 1

Related Questions