ah bon
ah bon

Reputation: 10021

Drop all rows in a group if a condition is not satisfied

I need to drop the groupby city and district rows, based on values of pct, say if I want drop out if there are pct is out of thresh -100 or 100.

For example, we can see a and d group has a pct value -627.44 which is lower than -100, so we droped all a and d rows. Similarly, we can see for b and e has a pct value 139.77, so we drop them as well.

  city district     date  price     pct
0    a        d  2020-01   2.42  -32.06
1    a        d  2020-02   1.43  -41.07
2    a        d  2020-03   0.27  -81.19
3    a        d  2020-04  -1.42 -627.44   # fail, drop all "a"
4    b        e  2020-03   2.14  -21.71
5    b        e  2020-04   1.14  -46.91   
6    b        e  2020-05  -0.45  139.77   # fail, drop all "b"
7    c        f  2020-04   4.59  -22.26
8    c        f  2020-05   2.33  -49.13

The desired output will look like this. How can I do that? Thank you.

  city district     date  price    pct
0    c        f  2020-04   4.59 -22.26
1    c        f  2020-05   2.33 -49.13

Upvotes: 1

Views: 221

Answers (2)

ansev
ansev

Reputation: 30920

or we can use GroupBy.filter:

df.groupby(['city','district']).filter(lambda x: (x['pct'].between(-100,100)).all()) 

Output

  city district     date  price    pct
0    c        f  2020-04   4.59 -22.26
1    c        f  2020-05   2.33 -49.13

Upvotes: 2

cs95
cs95

Reputation: 402533

Create a boolean condition and broadcast it to all rows per group using GroupBy.transform:

df[(df['pct'].between(-100, 100)).groupby(df['city']).transform('all')] 

  city district     date  price    pct
7    c        f  2020-04   4.59 -22.26
8    c        f  2020-05   2.33 -49.13

The result of transform will tell you which groups satisfy this condition:

(df['pct'].between(-100, 100)).groupby(df['city']).transform('all')

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7     True
8     True
Name: pct, dtype: bool

You'd then use this to filter df.


(df.loc[(df['pct'].between(-100, 100)).groupby(df['city']).transform('all')] 
   .reset_index(drop=True))

  city district     date  price    pct
0    c        f  2020-04   4.59 -22.26
1    c        f  2020-05   2.33 -49.13

Upvotes: 1

Related Questions