Náthali
Náthali

Reputation: 937

python pandas cumsum with groupby and condition

I have this dataframe

In [1]: import pandas as pd                                                                                                                                                                                                                                                                                       

In [2]: data = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'B', 'B', 'B'], 'Tag': ['X', '', 'X', '', 'X', '',''], 'Counts': [1,3,5,2,3,2,1]})                                                                                                                                                                        

In [3]: data                                                                                                                                                                                                                                                                                                      
Out[3]: 
  ID Tag  Counts
0  A   X       1
1  A           3
2  A   X       5
3  A           2
4  B   X       3
5  B           2
6  B           1

I want to create a new column with cum sum group by column=ID but restart sum if column Tag = X

In [6]: data['before'] = data.groupby(['ID']).Counts.cumsum()                                                                                                                                                                                                                                                     

In [7]: data['after'] = [1,4,5,7,3,5,6]                                                                                                                                                                                                                                                                           

In [8]: data                                                                                                                                                                                                                                                                                                      
Out[8]: 
  ID Tag  Counts  before  after
0  A   X       1       1      1
1  A           3       4      4
2  A   X       5       9      5
3  A           2      11      7
4  B   X       3       3      3
5  B           2       5      5
6  B           1       6      6

I want to get column 'after'

Upvotes: 3

Views: 599

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

You can use .eq('X').cumsum() to identify the group starting with X, which you can use in groupby together with 'ID':

data['after'] = data.groupby(['ID',data.Tag.eq('X').cumsum()])['Counts'].cumsum()

Output:

  ID Tag  Counts  after
0  A   X       1      1
1  A           3      4
2  A   X       5      5
3  A           2      7
4  B   X       3      3
5  B           2      5
6  B           1      6

Upvotes: 3

Related Questions