AOJ keygen
AOJ keygen

Reputation: 103

drop_duplicates even more for a specific column with latest value?

Is there a way to customize drop_duplicates so that it drops the "kind of" duplicates?

Example: pandas df

Year Name ID City
2011 Superman 101 Metropolis
2011 Batman 102 Gotham
2012 The Batman 102 Gotham
2011 Noobmaster69 103 Online
2011 Noobmaster69 103 Online

I tried using drop_duplicates so I got this

Year Name ID City
2011 Superman 101 Metropolis
2011 Batman 102 Gotham
2012 The Batman 102 Gotham
2011 Noobmaster69 103 Online

I actually want to squeeze it even more, as I want only "102" row with "The Batman" which is newer info (2012>2011) to be on the data frame. Expecting something like this

Year Name ID City
2011 Superman 101 Metropolis
2012 The Batman 102 Gotham
2011 Noobmaster69 103 Online

Upvotes: 0

Views: 48

Answers (1)

Siva Reddy
Siva Reddy

Reputation: 73

Try this, duplicates can be easily delete with ID column.

import pandas as pd

#reads your table data
read_file = pd.read_csv("your_filename.csv")

df = pd.DataFrame(read_file)
df = df.drop_duplicates(subset='ID', keep='last')

subset = "specific_col" used to drop the items from the specific column and keep = "last" used to keep the last duplicate (removes first duplicate)

Upvotes: 1

Related Questions