Drop all rows in a group if a condition is not satisfied

Question

I need to drop the groupby city and district rows, based on values of pct, say if I want drop out if there are pct is out of thresh -100 or 100.

For example, we can see a and d group has a pct value -627.44 which is lower than -100, so we droped all a and d rows. Similarly, we can see for b and e has a pct value 139.77, so we drop them as well.

  city district     date  price     pct
0    a        d  2020-01   2.42  -32.06
1    a        d  2020-02   1.43  -41.07
2    a        d  2020-03   0.27  -81.19
3    a        d  2020-04  -1.42 -627.44   # fail, drop all "a"
4    b        e  2020-03   2.14  -21.71
5    b        e  2020-04   1.14  -46.91   
6    b        e  2020-05  -0.45  139.77   # fail, drop all "b"
7    c        f  2020-04   4.59  -22.26
8    c        f  2020-05   2.33  -49.13

The desired output will look like this. How can I do that? Thank you.

  city district     date  price    pct
0    c        f  2020-04   4.59 -22.26
1    c        f  2020-05   2.33 -49.13

cs95 · Accepted Answer

Create a boolean condition and broadcast it to all rows per group using GroupBy.transform:

df[(df['pct'].between(-100, 100)).groupby(df['city']).transform('all')] 

  city district     date  price    pct
7    c        f  2020-04   4.59 -22.26
8    c        f  2020-05   2.33 -49.13

The result of transform will tell you which groups satisfy this condition:

(df['pct'].between(-100, 100)).groupby(df['city']).transform('all')

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7     True
8     True
Name: pct, dtype: bool

You'd then use this to filter df.

(df.loc[(df['pct'].between(-100, 100)).groupby(df['city']).transform('all')] 
   .reset_index(drop=True))

  city district     date  price    pct
0    c        f  2020-04   4.59 -22.26
1    c        f  2020-05   2.33 -49.13

Drop all rows in a group if a condition is not satisfied

Answers (2)

Related Questions