Reputation: 11
I want to leave value group by mem target == 0 rows before target == 1 and discard rows after target == 1.
how can I do this?
mem = np.repeat(['a','b','c'], 3)
date = ['2021-12-07', '2022-01-26', '2022-02-03', '2022-01-04', '2022-02-23', '2022-02-25', '2021-10-15', '2021-10-24', '2022-01-08']
target = [0,0,1,0,0,1,0,1,0]
dfTemp = pd.DataFrame({'mem' : mem,
'date' : date,
'target' : target})
dfTemp
mem date target
0 a 2021-12-07 0
1 a 2022-01-26 0
2 a 2022-02-03 1
3 b 2022-01-04 0
4 b 2022-02-23 0
5 b 2022-02-25 1
6 c 2021-10-15 0
7 c 2021-10-24 1
8 c 2022-01-08 0
result I want
mem date target
0 a 2021-12-07 0
1 a 2022-01-26 0
3 b 2022-01-04 0
4 b 2022-02-23 0
6 c 2021-10-15 0
Upvotes: 1
Views: 59
Reputation: 6642
Create a mask by grouping by mem
and select those rows where the cumulative sum of target within each group is smaller than 1:
dfTemp.loc[dfTemp.groupby("mem").target.cumsum().lt(1)]
Output:
mem date target
0 a 2021-12-07 0
1 a 2022-01-26 0
3 b 2022-01-04 0
4 b 2022-02-23 0
6 c 2021-10-15 0
The Series dfTemp.groupby("mem").target.cumsum()
jumps to a value greater than 0 as soon as the first 1 is encountered within each group:
dfTemp.groupby("mem").target.cumsum()
0 0
1 0
2 1
3 0
4 0
5 1
6 0
7 1
8 1
Name: target, dtype: int64
Upvotes: 2