Reputation: 458
If a value occurs more than two times in a column I want to drop every row that it occurs in.
The input df would look like:
Name Num
X 1
X 2
Y 3
Y 4
X 5
The output df would look like:
Name Num
Y 3
Y 4
I know it is possible to remove duplicates, but that only works if I want to remove the first or last duplicate that is found, not the nth duplicate.
df = df.drop_duplicates(subset = ['Name'], drop='third')
This code is completely wrong but it helps explain what I was trying to do.
Upvotes: 0
Views: 164
Reputation: 862571
Use GroupBy.cumcount
for counter and filter all values less like 2
:
df1 = df[df.groupby('Name').cumcount() < 3]
print (df1)
Name Num
0 X 1
1 X 2
2 Y 3
3 Y 4
Detail:
print (df.groupby('Name').cumcount())
0 0
1 1
2 0
3 1
4 2
dtype: int64
EDIT
Filter by GroupBy.transform
and GroupBy.size
:
df1 = df[df.groupby('Name')['Num'].transform('size') < 3]
print (df1)
Name Num
2 Y 3
3 Y 4
Upvotes: 4
Reputation: 323226
Using head
df.groupby('Name').head(2)
Out[375]:
Name Num
0 X 1
1 X 2
2 Y 3
3 Y 4
s=df.groupby('Name').size()<=2
df.loc[df.Name.isin(s[s].index)]
Out[380]:
Name Num
2 Y 3
3 Y 4
Upvotes: 4