hernanavella
hernanavella

Reputation: 5552

In Python, is there a direct way to filter a pd.dataframe conditional on 2 ranges of values of a column?

I got a simple dataframe:

df
Out[102]: 
         0        1
0   nfp_zb  0.04325
1   ftb_zb  0.05645
2   ftb_cl  0.09055
3     cl_2  0.12865
4   ftb_gc  0.13385
5     cl_1  0.22795
6     cl_3  0.26985
7     es_3  0.37955
8     es_2  0.39450
9     zb_3  0.42170
10    es_1  0.45170
11  nfp_es  0.47190
12    zb_2  0.50130
13  nfp_cl  0.53170
14  nfp_gc  0.74260
15    gc_2  0.76640
16    gc_3  0.80915
17    zb_1  0.83010
18    gc_1  0.89795

All I am trying to do is to select the values greater than threshold a, and smaller than threshold b, where the two ranges of values are NON OVERLAPPING. Imagine (greater than 85% & smaller than 15%). Obvioulsy both conditions are independent. So I do it like this:

def filter(df):
    df['filter'] = ""
    df.loc[df[1] > 0.85, 'filter'] = 1
    df.loc[df[1] < 0.15, 'filter'] = 1
    df = df[df['filter'] == 1]
    del df['filter']
    return df

And I get the right answer:

filter(df)
Out[104]: 
         0        1 
0   nfp_zb  0.04325       
1   ftb_zb  0.05645      
2   ftb_cl  0.09055      
3     cl_2  0.12865      
4   ftb_gc  0.13385      
18    gc_1  0.89795   

However, I would like to know if there is a direct way to do this, without creating a custom formula. Perhaps using groupby....

Thanks for the help

Upvotes: 3

Views: 152

Answers (3)

user308827
user308827

Reputation: 21981

You could try df.query which was added in pandas v0.13

import pandas as pd
df = pd.read_clipboard()
df

         A        B
0   nfp_zb  0.04325
1   ftb_zb  0.05645
2   ftb_cl  0.09055
3     cl_2  0.12865
4   ftb_gc  0.13385
5     cl_1  0.22795
6     cl_3  0.26985
7     es_3  0.37955
8     es_2  0.39450
9     zb_3  0.42170
10    es_1  0.45170
11  nfp_es  0.47190
12    zb_2  0.50130
13  nfp_cl  0.53170
14  nfp_gc  0.74260
15    gc_2  0.76640
16    gc_3  0.80915
17    zb_1  0.83010
18    gc_1  0.89795

df.query('B > 0.85 or B < 0.15')

Upvotes: 3

exp1orer
exp1orer

Reputation: 12039

You probably want to use boolean masking.

mask1 = df['1'] > .85
mask2 = df['1'] < .15

filtered = df[mask1 | mask2]

Upvotes: 3

Marius
Marius

Reputation: 60080

You can just put all the conditions into the .loc accessor, separated by an or operator:

df.loc[(df['1'] > 0.85) | (df['1'] < 0.15), :]
Out[19]: 
         0        1
0   nfp_zb  0.04325
1   ftb_zb  0.05645
2   ftb_cl  0.09055
3     cl_2  0.12865
4   ftb_gc  0.13385
18    gc_1  0.89795

The suggestions people have given in other answers should work equally well, you just have to flip the inequality and use or instead of and.

Upvotes: 3

Related Questions