Reputation: 503
Here is a simplified example of my pandas dataframe:
User Binary
0 UserA 0
1 UserA 0
2 UserA 0
3 UserA 1
4 UserA 0
5 UserA 1
6 UserA 0
7 UserA 0
8 UserB 0
9 UserB 0
10 UserB 0
11 UserB 0
12 UserB 0
13 UserB 1
14 UserB 1
15 UserB 0
16 UserC 0
17 UserC 0
For each User, I would like to remove all rows after the first occurrence of Binary=1. Note, there will be some Users that have no cases of Binary=1, e.g. UserC in this example.
Output would look like below:
User Binary
0 UserA 0
1 UserA 0
2 UserA 0
3 UserA 1
8 UserB 0
9 UserB 0
10 UserB 0
11 UserB 0
12 UserB 0
13 UserB 1
16 UserC 0
17 UserC 0
Upvotes: 1
Views: 132
Reputation: 88236
Here's one approach using groupby
and transforming with a custom function:
# check which Binary values are 1 and group the series by User
g = df.Binary.eq(1).groupby(df.User)
# transform to either idxmax or the last index depending
# on whether there are any Trues or not
m = g.transform(lambda x: x.idxmax() if x.any() else x.index[-1])
# index the dataframe where the index is smaler or eq m
out = df[df.index <= m]
print(out)
User Binary
0 UserA 0
1 UserA 0
2 UserA 0
3 UserA 1
8 UserB 0
9 UserB 0
10 UserB 0
11 UserB 0
12 UserB 0
13 UserB 1
16 UserC 0
17 UserC 0
Upvotes: 2
Reputation: 862841
Idea is test maximal value of consecutive values in swapped order by DataFrame.iloc
, what working also if only 0
or only 1
groups values correctly:
def f(x):
s = x.cumsum()
return s.eq(s.max())
df = df[df.iloc[::-1].groupby('User')['Binary'].transform(f).sort_index()]
print (df)
User Binary
0 UserA 0
1 UserA 0
2 UserA 0
3 UserA 1
8 UserB 0
9 UserB 0
10 UserB 0
11 UserB 0
12 UserB 0
13 UserB 1
16 UserC 0
17 UserC 0
Upvotes: 1