Remove all groups with more than N observations

Question

If a value occurs more than two times in a column I want to drop every row that it occurs in.

The input df would look like:

The output df would look like:

Name   Num
  Y     3
  Y     4

I know it is possible to remove duplicates, but that only works if I want to remove the first or last duplicate that is found, not the nth duplicate.

df = df.drop_duplicates(subset = ['Name'], drop='third')

This code is completely wrong but it helps explain what I was trying to do.

BENY · Accepted Answer

Using head

df.groupby('Name').head(2)
Out[375]: 
  Name  Num
0    X    1
1    X    2
2    Y    3
3    Y    4

s=df.groupby('Name').size()<=2
df.loc[df.Name.isin(s[s].index)]
Out[380]: 
  Name  Num
2    Y    3
3    Y    4

Answers (2)