Reputation: 9
I wasn't sure the best way to work my question. Suppose I have a dataframe
id decision
1 Yes
3 No
2 Yes
2 No
4 No
4 No
What I am looking to do is remove duplicates based on the id column so there is only one instance of each id type. However, for id's with multiple instances, if any of the values in decision is "Yes", then after removing the duplicates, the decision for the one remaining will be Yes".
So in this case, the output would look something like this because at least one of the decisions for id matching 2 was Yes.
id decision
1 Yes
3 No
2 Yes
4 No
I was looking to use drop_duplicates(), but I make the decision on which duplicate to keep just based on the first or last instance because they are in different orders.
Any help?
Upvotes: 0
Views: 45
Reputation: 323276
s=df.sort_values('decision').drop_duplicates('id',keep='last').sort_index()
id decision
0 1 Yes
1 3 No
2 2 Yes
5 4 No
Upvotes: 1
Reputation: 1267
Something like this might work ( it does not preserve the order though ) -
import pandas as pd
df = pd.DataFrame({'id':[1,3,2,2,4,4], 'decision':['Yes', 'No', 'Yes', 'No', 'No', 'No']})
df
id decision
0 1 Yes
1 3 No
2 2 Yes
3 2 No
4 4 No
5 4 No
df.sort_values(['id', 'decision'], ascending=[True, False]).drop_duplicates(['id'], keep='first')
id decision
0 1 Yes
2 2 Yes
1 3 No
4 4 No
Upvotes: 0