sunghwan jang
sunghwan jang

Reputation: 11

How can I get group by specific row

I want to leave value group by mem target == 0 rows before target == 1 and discard rows after target == 1.

how can I do this?

mem = np.repeat(['a','b','c'], 3)
date = ['2021-12-07', '2022-01-26', '2022-02-03', '2022-01-04', '2022-02-23', '2022-02-25', '2021-10-15', '2021-10-24', '2022-01-08']
target = [0,0,1,0,0,1,0,1,0]
dfTemp = pd.DataFrame({'mem' : mem,
                        'date' : date,
                        'target' : target})
dfTemp

   mem  date     target
0   a   2021-12-07  0
1   a   2022-01-26  0
2   a   2022-02-03  1
3   b   2022-01-04  0
4   b   2022-02-23  0
5   b   2022-02-25  1
6   c   2021-10-15  0
7   c   2021-10-24  1
8   c   2022-01-08  0




result I want

   mem  date     target
0   a   2021-12-07  0
1   a   2022-01-26  0
3   b   2022-01-04  0
4   b   2022-02-23  0
6   c   2021-10-15  0

Upvotes: 1

Views: 59

Answers (1)

mcsoini
mcsoini

Reputation: 6642

Create a mask by grouping by mem and select those rows where the cumulative sum of target within each group is smaller than 1:

dfTemp.loc[dfTemp.groupby("mem").target.cumsum().lt(1)]

Output:

  mem        date  target
0   a  2021-12-07       0
1   a  2022-01-26       0
3   b  2022-01-04       0
4   b  2022-02-23       0
6   c  2021-10-15       0

The Series dfTemp.groupby("mem").target.cumsum() jumps to a value greater than 0 as soon as the first 1 is encountered within each group:

dfTemp.groupby("mem").target.cumsum()

0    0
1    0
2    1
3    0
4    0
5    1
6    0
7    1
8    1
Name: target, dtype: int64

Upvotes: 2

Related Questions