Reputation: 97
I have a df with columns name
and subject
. I'm trying to remove duplicates for only math
value after first
value row for each user
name subject
0 mason first
1 mason math
2 mason math
3 mason first
4 mason chem
5 mason math
6 mason math
7 paul first
8 paul chem
9 paul first
10 paul math
11 paul math
Final df
name subject
0 mason first
1 mason math
2 mason first
3 mason chem
4 mason math
5 paul first
6 paul chem
7 paul first
8 paul math
Upvotes: 1
Views: 144
Reputation: 75080
Here is one way using a condition used to create a cumulative sum column for a grouper and df.groupby.apply
to check the conditions for each group:
c1 = df['subject'].eq("first").cumsum()
out = (df[df.groupby(["name",c1])['subject']
.apply(lambda x: (~x.duplicated()&x.eq("math")) | x.ne('math'))])
print(out)
name subject
0 mason first
1 mason math
3 mason first
4 mason chem
5 mason math
7 paul first
8 paul chem
9 paul first
10 paul math
Upvotes: 4