hacaho
hacaho

Reputation: 81

Pandas drop last duplicate record and keep remaining

have a pandas dataframme with columns name , school and marks

name  school  marks

tom     HBS     55
tom     HBS     54
tom     HBS     12
mark    HBS     28
mark    HBS     19
lewis   HBS     88

How to drop last duplicate row and keep reamining data

name  school  marks

tom     HBS     55
tom     HBS     54
mark    HBS     28
lewis   HBS     88

tried this:

df.drop_duplicates(['name','school'],keep=last)


print(df)

Upvotes: 2

Views: 1104

Answers (2)

MrV
MrV

Reputation: 169

I extrapolated @DSM's answer(from here) taking into account that you want rows with no duplicates:

df.groupby("name", as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[:-1]).reset_index()

Upvotes: 0

mozway
mozway

Reputation: 261880

If you want to drop only the last duplicate, you need to use two masks:

m1 = df.duplicated(['name','school'], keep="last") # is it the last row per group?
m2 = ~df.duplicated(['name','school'], keep=False) # is it not duplicated?
new_df = df[m1|m2]

output:

    name school  marks
0    tom    HBS     55
1    tom    HBS     54
3   mark    HBS     28
5  lewis    HBS     88

Upvotes: 3

Related Questions