How to find duplicates in pandas?

I've a data frame of about 52000 rows with some duplicates, when I use

df_drop_duplicates() 

I loose about 1000 rows, but I don't want to erase this rows I want to know which ones are the duplicates rows

Upvotes: 2

Views: 14347

Answers (2)

Anton Protopopov
Anton Protopopov

Reputation: 31662

You could use duplicated for that:

df[df.duplicated()]

You could specify keep argument for what you want, from docs:

keep : {‘first’, ‘last’, False}, default ‘first’

  • first : Mark duplicates as True except for the first occurrence.
  • last : Mark duplicates as True except for the last occurrence.
  • False : Mark all duplicates as True.

Upvotes: 10

Arthur D. Howland
Arthur D. Howland

Reputation: 4547

To identify duplicates within a pandas column without dropping the duplicates, try:

Let 'Column_A' = column with duplicate entries 'Column_B' = a true/false column that marks duplicates in Column A.

df['Column_B'] = df.duplicated(subset='Column_A', keep='first')

Change the parameters to fine tune to your needs.

Upvotes: 0

Related Questions