Reputation: 5552
I got a simple dataframe:
df
Out[102]:
0 1
0 nfp_zb 0.04325
1 ftb_zb 0.05645
2 ftb_cl 0.09055
3 cl_2 0.12865
4 ftb_gc 0.13385
5 cl_1 0.22795
6 cl_3 0.26985
7 es_3 0.37955
8 es_2 0.39450
9 zb_3 0.42170
10 es_1 0.45170
11 nfp_es 0.47190
12 zb_2 0.50130
13 nfp_cl 0.53170
14 nfp_gc 0.74260
15 gc_2 0.76640
16 gc_3 0.80915
17 zb_1 0.83010
18 gc_1 0.89795
All I am trying to do is to select the values greater than threshold a, and smaller than threshold b, where the two ranges of values are NON OVERLAPPING. Imagine (greater than 85% & smaller than 15%). Obvioulsy both conditions are independent. So I do it like this:
def filter(df):
df['filter'] = ""
df.loc[df[1] > 0.85, 'filter'] = 1
df.loc[df[1] < 0.15, 'filter'] = 1
df = df[df['filter'] == 1]
del df['filter']
return df
And I get the right answer:
filter(df)
Out[104]:
0 1
0 nfp_zb 0.04325
1 ftb_zb 0.05645
2 ftb_cl 0.09055
3 cl_2 0.12865
4 ftb_gc 0.13385
18 gc_1 0.89795
However, I would like to know if there is a direct way to do this, without creating a custom formula. Perhaps using groupby....
Thanks for the help
Upvotes: 3
Views: 152
Reputation: 21981
You could try df.query which was added in pandas v0.13
import pandas as pd
df = pd.read_clipboard()
df
A B
0 nfp_zb 0.04325
1 ftb_zb 0.05645
2 ftb_cl 0.09055
3 cl_2 0.12865
4 ftb_gc 0.13385
5 cl_1 0.22795
6 cl_3 0.26985
7 es_3 0.37955
8 es_2 0.39450
9 zb_3 0.42170
10 es_1 0.45170
11 nfp_es 0.47190
12 zb_2 0.50130
13 nfp_cl 0.53170
14 nfp_gc 0.74260
15 gc_2 0.76640
16 gc_3 0.80915
17 zb_1 0.83010
18 gc_1 0.89795
df.query('B > 0.85 or B < 0.15')
Upvotes: 3
Reputation: 12039
You probably want to use boolean masking.
mask1 = df['1'] > .85
mask2 = df['1'] < .15
filtered = df[mask1 | mask2]
Upvotes: 3
Reputation: 60080
You can just put all the conditions into the .loc
accessor, separated by an or operator:
df.loc[(df['1'] > 0.85) | (df['1'] < 0.15), :]
Out[19]:
0 1
0 nfp_zb 0.04325
1 ftb_zb 0.05645
2 ftb_cl 0.09055
3 cl_2 0.12865
4 ftb_gc 0.13385
18 gc_1 0.89795
The suggestions people have given in other answers should work equally well, you just have to flip the inequality and use or instead of and.
Upvotes: 3