Reputation: 43
I have the following DataFrame consisting out of columns id, brand and count
Id brand count
1 Audi 3
2 BWM 5
2 FORD 3
3 AUDI 7
4 BMW 2
5 Audi 4
5 FORD 3
I would like to groupby id and only remain each id with the brand that has the highest count. So in the end I would like to have the following:
id brand
1 AUDI
2 BMW
3 AUDI
4 BMW
5 AUDI
I have something like this but that obviously is not working. So what would be the correct function or syntax to accomplish that? Thanks!
data.groupby('id')['brand'].where(max('count'))
Upvotes: 4
Views: 52
Reputation: 323326
IIUC
df=df.sort_values(['Id','count']).drop_duplicates('Id',keep='last')
Out[249]:
Id brand count
0 1 Audi 3
1 2 BWM 5
3 3 AUDI 7
4 4 BMW 2
5 5 Audi 4
Upvotes: 1
Reputation: 18647
IIUC use groupby.idxmax
and loc
:
df.loc[df.groupby('Id')['count'].idxmax()]
[out]
Id brand count
0 1 Audi 3
1 2 BWM 5
3 3 AUDI 7
4 4 BMW 2
5 5 Audi 4
Upvotes: 3