Dropping duplicate value in column depnding on value of another

Question

I wasn't sure the best way to work my question. Suppose I have a dataframe

id    decision
1     Yes
3     No
2     Yes
2     No
4     No
4     No

What I am looking to do is remove duplicates based on the id column so there is only one instance of each id type. However, for id's with multiple instances, if any of the values in decision is "Yes", then after removing the duplicates, the decision for the one remaining will be Yes".

So in this case, the output would look something like this because at least one of the decisions for id matching 2 was Yes.

id    decision
1     Yes
3     No
2     Yes
4     No

I was looking to use drop_duplicates(), but I make the decision on which duplicate to keep just based on the first or last instance because they are in different orders.

Any help?

BENY · Accepted Answer

s=df.sort_values('decision').drop_duplicates('id',keep='last').sort_index()
   id decision
0   1      Yes
1   3       No
2   2      Yes
5   4       No

Dropping duplicate value in column depnding on value of another

Answers (2)

Related Questions