Reputation: 10021
I need to drop the groupby city
and district
rows, based on values of pct
, say if I want drop out if there are pct
is out of thresh -100
or 100
.
For example, we can see a
and d
group has a pct
value -627.44
which is lower than -100
, so we droped all a
and d
rows. Similarly, we can see for b
and e
has a pct
value 139.77
, so we drop them as well.
city district date price pct
0 a d 2020-01 2.42 -32.06
1 a d 2020-02 1.43 -41.07
2 a d 2020-03 0.27 -81.19
3 a d 2020-04 -1.42 -627.44 # fail, drop all "a"
4 b e 2020-03 2.14 -21.71
5 b e 2020-04 1.14 -46.91
6 b e 2020-05 -0.45 139.77 # fail, drop all "b"
7 c f 2020-04 4.59 -22.26
8 c f 2020-05 2.33 -49.13
The desired output will look like this. How can I do that? Thank you.
city district date price pct
0 c f 2020-04 4.59 -22.26
1 c f 2020-05 2.33 -49.13
Upvotes: 1
Views: 221
Reputation: 30920
or we can use GroupBy.filter
:
df.groupby(['city','district']).filter(lambda x: (x['pct'].between(-100,100)).all())
Output
city district date price pct
0 c f 2020-04 4.59 -22.26
1 c f 2020-05 2.33 -49.13
Upvotes: 2
Reputation: 402533
Create a boolean condition and broadcast it to all rows per group using GroupBy.transform
:
df[(df['pct'].between(-100, 100)).groupby(df['city']).transform('all')]
city district date price pct
7 c f 2020-04 4.59 -22.26
8 c f 2020-05 2.33 -49.13
The result of transform
will tell you which groups satisfy this condition:
(df['pct'].between(-100, 100)).groupby(df['city']).transform('all')
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 True
8 True
Name: pct, dtype: bool
You'd then use this to filter df.
(df.loc[(df['pct'].between(-100, 100)).groupby(df['city']).transform('all')]
.reset_index(drop=True))
city district date price pct
0 c f 2020-04 4.59 -22.26
1 c f 2020-05 2.33 -49.13
Upvotes: 1