Remove non-duplicated rows from pandas

Question

This is rather simple but I can't get me head around it. Let's say for the following data frame, I want to keep only the rows with duplicated values in column y:

The desired output looks like:

I tried this:

df[~df.duplicated('y')]

but I get this:

Anton vBR · Accepted Answer

Docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html

keep : {‘first’, ‘last’, False}, default ‘first’

first : Mark duplicates as True except for the first occurrence.

last : Mark duplicates as True except for the last occurrence.

False : Mark all duplicates as True.

Meaning you are looking for:

df[df.duplicated('y',keep=False)]

Output:

Remove non-duplicated rows from pandas

Answers (1)

Related Questions