MRHarv
MRHarv

Reputation: 503

Keep until last occurrence of value per group

Here is a simplified example of my pandas dataframe:

     User  Binary
0   UserA       0
1   UserA       0
2   UserA       0
3   UserA       1
4   UserA       0
5   UserA       1
6   UserA       0
7   UserA       0
8   UserB       0
9   UserB       0
10  UserB       0
11  UserB       0
12  UserB       0
13  UserB       1
14  UserB       1
15  UserB       0
16  UserC       0
17  UserC       0

For each User, I would like to remove all rows after the first occurrence of Binary=1. Note, there will be some Users that have no cases of Binary=1, e.g. UserC in this example.

Output would look like below:

     User  Binary
0   UserA       0
1   UserA       0
2   UserA       0
3   UserA       1
8   UserB       0
9   UserB       0
10  UserB       0
11  UserB       0
12  UserB       0
13  UserB       1
16  UserC       0
17  UserC       0

Upvotes: 1

Views: 132

Answers (2)

yatu
yatu

Reputation: 88236

Here's one approach using groupby and transforming with a custom function:

# check which Binary values are 1 and group the series by User
g = df.Binary.eq(1).groupby(df.User)
# transform to either idxmax or the last index depending
# on whether there are any Trues or not
m = g.transform(lambda x: x.idxmax() if x.any() else x.index[-1])
# index the dataframe where the index is smaler or eq m
out = df[df.index <= m]

print(out)

     User  Binary
0   UserA       0
1   UserA       0
2   UserA       0
3   UserA       1
8   UserB       0
9   UserB       0
10  UserB       0
11  UserB       0
12  UserB       0
13  UserB       1
16  UserC       0
17  UserC       0

Upvotes: 2

jezrael
jezrael

Reputation: 862841

Idea is test maximal value of consecutive values in swapped order by DataFrame.iloc, what working also if only 0 or only 1 groups values correctly:

def f(x):
    s = x.cumsum()
    return s.eq(s.max())
df = df[df.iloc[::-1].groupby('User')['Binary'].transform(f).sort_index()]
print (df)
     User  Binary
0   UserA       0
1   UserA       0
2   UserA       0
3   UserA       1
8   UserB       0
9   UserB       0
10  UserB       0
11  UserB       0
12  UserB       0
13  UserB       1
16  UserC       0
17  UserC       0

Upvotes: 1

Related Questions