Reputation: 8166
I have the following code to remove the duplicates of a dataframe based on a given key:
Input:
dff = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":["A","A","B","A"], "C":[0,3,1,1]})
dff.drop_duplicates(subset=['A', 'B'], keep=False)
Output:
A B C
2 foo B 1
3 bar A 1
But how can I group by the same key, but selecting the row which has a larger number in "C" column. I mean, the desired output would be:
A B C
2 foo B 3
3 bar A 1
Upvotes: 0
Views: 72
Reputation: 323226
Seems like you need overwrite your column C with group
max
before you drop duplicate
dff.C=dff.groupby('A').C.transform('max')
dff.drop_duplicates(subset=['A', 'B'], keep=False)
Out[325]:
A B C
2 foo B 3
3 bar A 1
Upvotes: 1