Lynn
Lynn

Reputation: 4398

Remove duplicate rows in Pandas (possibly by group)

I have dataset, df, with the following data:

starttime               endtime              ID  Diff  
1/10/2020 9:05:00 PM    1/10/2020 9:05:10    A    10
1/10/2020 9:05:00 PM    1/10/2020 9:05:10    A    10
1/10/2020 9:06:00 PM    1/10/2020 9:06:10    B    10

Desired outcome:

starttime               endtime              ID Diff
1/10/2020 9:05:00 PM    1/10/2020 9:05:10    A  10
1/10/2020 9:06:00 PM    1/10/2020 9:06:10    B  10

If you notice, one of the rows from Group A was removed, because it was an exact duplicate:

1/10/2020 9:05:00 pm    1/10/2020 9:05:10    A   10

This is the code I am using, however, I am unsure as to what to include in the parentheses, or if this is correct:

df.drop_duplicates(subset=None, keep=False)

Any suggestions are appreciated.

Upvotes: 1

Views: 62

Answers (2)

lsabi
lsabi

Reputation: 4456

Try looking at the docs. If you can't figure out what's most appropriate for your case, then ask again, providing a context (e.g. example).

The link is for pandas 0.25

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html

Upvotes: 1

Kenan
Kenan

Reputation: 14094

You can supply the column

df.drop_duplicates(subset='ID', keep=False)

Upvotes: 2

Related Questions