Reputation: 886

Python Find duplicates across multiple columns

How do i filter a dataframe to only show rows with duplicates across multiple columns?

Example dataframe:

col1 col2 col3
A1    B1   C1
A1    B1   C1
A1    B1   C2
A2    B2   C2

Expected output:

col1 col2 col3
A1    B1   C1
A1    B1   C1

My attempt:

df[df.duplicated(['col1', 'col2', 'col3'], keep=False)]

but this does not give expected outcome.

Upvotes: 2

Answers (1)

Reputation: 11105

Your attempt df[df.duplicated(['col1', 'col2', 'col3'], keep=False)] works in my testing. You can leave out the column names:

df[df.duplicated(keep=False)]

Upvotes: 7