sjkluend
sjkluend

Reputation: 9

Dropping duplicate value in column depnding on value of another

I wasn't sure the best way to work my question. Suppose I have a dataframe

id    decision
1     Yes
3     No
2     Yes
2     No
4     No
4     No

What I am looking to do is remove duplicates based on the id column so there is only one instance of each id type. However, for id's with multiple instances, if any of the values in decision is "Yes", then after removing the duplicates, the decision for the one remaining will be Yes".

So in this case, the output would look something like this because at least one of the decisions for id matching 2 was Yes.

id    decision
1     Yes
3     No
2     Yes
4     No

I was looking to use drop_duplicates(), but I make the decision on which duplicate to keep just based on the first or last instance because they are in different orders.

Any help?

Upvotes: 0

Views: 45

Answers (2)

BENY
BENY

Reputation: 323276

s=df.sort_values('decision').drop_duplicates('id',keep='last').sort_index()
   id decision
0   1      Yes
1   3       No
2   2      Yes
5   4       No

Upvotes: 1

Sajan
Sajan

Reputation: 1267

Something like this might work ( it does not preserve the order though ) -

import pandas as pd
df = pd.DataFrame({'id':[1,3,2,2,4,4], 'decision':['Yes', 'No', 'Yes', 'No', 'No', 'No']})
df 
    id decision
0   1      Yes
1   3       No
2   2      Yes
3   2       No
4   4       No
5   4       No

df.sort_values(['id', 'decision'], ascending=[True, False]).drop_duplicates(['id'], keep='first')
    id decision
0   1      Yes
2   2      Yes
1   3       No
4   4       No

Upvotes: 0

Related Questions