select the last 2 values in the groupby with condition

Question

I need to select the rows of the last value for each user_id and date, but when the last value in the metric column is 'leave' select the last 2 rows(if exists). My data:

df = pd.DataFrame({
     "user_id": [1,1,1, 2,2,2]
    ,'subscription': [1,1,2,3,4,5]
    ,"metric": ['enter', 'stay', 'leave', 'enter', 'leave', 'enter']
    ,'date': ['2020-01-01', '2020-01-01', '2020-03-01', '2020-01-01', '2020-01-01', '2020-01-02']
})
#result
    user_id subscription    metric  date
0   1       1               enter   2020-01-01
1   1       1               stay    2020-01-01
2   1       2               leave   2020-03-01
3   2       3               enter   2020-01-01
4   2       4               leave   2020-01-01
5   2       5               enter   2020-01-02

Expected output:

    user_id subscription    metric  date
1   1       1               stay    2020-01-01
2   1       2               leave   2020-03-01
3   2       3               enter   2020-01-01 # stay because last metric='leave' inside group[user_id, date]
4   2       4               leave   2020-01-01
5   2       5               enter   2020-01-02

What I've tried: drop_duplicates and groupby, both give the same result, only with the last value

df.drop_duplicates(['user_id', 'date'], keep='last')
#or
df.groupby(['user_id', 'date']).tail(1)

David Erickson · Accepted Answer

You can use boolean masking and return three different conditions that are True or False with variables a, b, or c. Then, filter for when the data a, b, or c returns True with the or operator |:

a = df.groupby(['user_id', 'date', df.groupby(['user_id', 'date']).cumcount()])['metric'].transform('last') == 'leave'
b = df.groupby(['user_id', 'date'])['metric'].transform('count') == 1
c = a.shift(-1) & (b == False)
df = df[a | b | c]
print(a, b, c)
df

#a groupby the two required groups plus a group that finds the cumulative count, which is necessary in order to return True for the last "metric" within the the group.
0    False
1    False
2     True
3    False
4     True
5    False
Name: metric, dtype: bool

#b if something has a count of one, then you want to keep it.
0    False
1    False
2    True
3    False
4    False
5    True
Name: metric, dtype: bool

#c simply use .shift(-1) to find the row before the row. For the condition to be satisfied the count for that group must be > 1
0    False
1    True
2    False
3    True
4    False
5    False
Name: metric, dtype: bool

Out[18]: 
   user_id  subscription metric        date
1        1             1   stay  2020-01-01
2        1             2  leave  2020-03-01
3        2             3  enter  2020-01-01
4        2             4  leave  2020-01-01
5        2             5  enter  2020-01-02

select the last 2 values in the groupby with condition

Answers (2)

Related Questions

select the last 2 values ​in the groupby with condition

Answers (2)

Related Questions

select the last 2 values in the groupby with condition