Reputation: 191

Pandas Groupby and find duplicates in multiple columns

I have a dataframe and I want to groupby the 'Value_pack' column and check if 2 or more 'Value_pack' have the same 'value' and 'discount'. (Duplicates)

I want to remove all but the first occurrence of duplicates from the dataframe.

Input Dataframe:

  Value_pack    value   discount
    val 1        ADA       0
    val 2        ADB       100
    val 2        ADA       0  <---- duplicate
    val 3        ADA       50
    val 3        ADC       50
    val 4        ADV       40

Output Dataframe:

    Value_pack  value   discount
    val 1        ADA       0
    val 2        ADB       100
    val 3        ADA       50
    val 3        ADC       50
    val 4        ADV       40

df.groupby(['Value_pack']).drop_duplicates(['value', 'discount'])

This is the code I have so far but I can't work out how to get the dataframe I want.

Upvotes: 1

Answers (3)

mcsoini

Reputation: 6642

Using groupby you need to approach this from the other end: You group by value and discount and pick the first Value_pack:

df.groupby(["value", "discount"]).first().reset_index()

Upvotes: 2

Akash Dubey

Reputation: 304

You would not need to do a group by for this, something like this could help :

df.drop_duplicates(subset = ['value', 'discount'], keep = 'first')

Upvotes: 1

filiabel

Reputation: 415

No need to use groupby. Try: df.drop_duplicates(subset=['value', 'discount']). Check out docs here.

Upvotes: 1

Pandas Groupby and find duplicates in multiple columns

Answers (3)

Related Questions