Reputation: 886
How do i filter a dataframe to only show rows with duplicates across multiple columns?
Example dataframe:
col1 col2 col3
A1 B1 C1
A1 B1 C1
A1 B1 C2
A2 B2 C2
Expected output:
col1 col2 col3
A1 B1 C1
A1 B1 C1
My attempt:
df[df.duplicated(['col1', 'col2', 'col3'], keep=False)]
but this does not give expected outcome.
Upvotes: 2
Views: 10436
Reputation: 11105
Your attempt df[df.duplicated(['col1', 'col2', 'col3'], keep=False)]
works in my testing. You can leave out the column names:
df[df.duplicated(keep=False)]
Upvotes: 7