Reputation: 191
I have a dataframe and I want to groupby the 'Value_pack' column and check if 2 or more 'Value_pack' have the same 'value' and 'discount'. (Duplicates)
I want to remove all but the first occurrence of duplicates from the dataframe.
Input Dataframe:
Value_pack value discount
val 1 ADA 0
val 2 ADB 100
val 2 ADA 0 <---- duplicate
val 3 ADA 50
val 3 ADC 50
val 4 ADV 40
Output Dataframe:
Value_pack value discount
val 1 ADA 0
val 2 ADB 100
val 3 ADA 50
val 3 ADC 50
val 4 ADV 40
df.groupby(['Value_pack']).drop_duplicates(['value', 'discount'])
This is the code I have so far but I can't work out how to get the dataframe I want.
Upvotes: 1
Views: 2137
Reputation: 6642
Using groupby you need to approach this from the other end: You group by value and discount and pick the first Value_pack:
df.groupby(["value", "discount"]).first().reset_index()
Upvotes: 2
Reputation: 304
You would not need to do a group by for this, something like this could help :
df.drop_duplicates(subset = ['value', 'discount'], keep = 'first')
Upvotes: 1