Reputation: 1993
I have a pandas data frame df
like this.
In [1]: df
Out[1]:
country count
0 Japan 78
1 Japan 80
2 USA 45
3 France 34
4 France 90
5 UK 45
6 UK 34
7 China 32
8 China 87
9 Russia 20
10 Russia 67
I want to remove rows with the maximum value in each group. So the result should look like:
country count
0 Japan 78
3 France 34
6 UK 34
7 China 32
9 Russia 20
My first attempt:
idx = df.groupby(['country'], sort=False).max()['count'].index
df_new = df.drop(list(idx))
My second attempt:
idx = df.groupby(['country'])['count'].transform(max).index
df_new = df.drop(list(idx))
But it didn't work. Any ideas?
Upvotes: 4
Views: 5427
Reputation: 164773
You can first calculate a series of maximums by group. Then filter out instances where count is equal to that series. Note this will also remove duplicates maximums.
g = df.groupby(['country'])['count'].transform('max')
df = df[~(df['count'] == g)]
The series g
represents maximums for each row by group. Where this equals df['count']
(by index), you have a row where you have the maximum for your group. You then use ~
for the negative condition.
print(df.groupby(['country'])['count'].transform('max'))
0 80
1 80
2 45
3 90
4 90
5 45
6 45
7 87
8 87
9 20
Name: count, dtype: int64
Alternatively, you can sort and drop the final occurrence:
res = df.sort_values('count')
res = res.drop(res.groupby('country').tail(1).index)
print(res)
country count
9 Russia 20
7 China 32
3 France 34
6 UK 34
0 Japan 78
Upvotes: 12