mllamazares
mllamazares

Reputation: 8166

How to remove duplicates according to an extra condition?

I have the following code to remove the duplicates of a dataframe based on a given key:

Input:

dff = pd.DataFrame({"A":["foo", "foo", "foo", "bar"],  "B":["A","A","B","A"], "C":[0,3,1,1]})
dff.drop_duplicates(subset=['A', 'B'], keep=False)

Output:

     A  B  C
2  foo  B  1
3  bar  A  1

But how can I group by the same key, but selecting the row which has a larger number in "C" column. I mean, the desired output would be:

     A  B  C
2  foo  B  3
3  bar  A  1

Upvotes: 0

Views: 72

Answers (1)

BENY
BENY

Reputation: 323226

Seems like you need overwrite your column C with group max before you drop duplicate

dff.C=dff.groupby('A').C.transform('max')
dff.drop_duplicates(subset=['A', 'B'], keep=False)
Out[325]: 
     A  B  C
2  foo  B  3
3  bar  A  1

Upvotes: 1

Related Questions