Reputation: 937
I have this dataframe
In [1]: import pandas as pd
In [2]: data = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'B', 'B', 'B'], 'Tag': ['X', '', 'X', '', 'X', '',''], 'Counts': [1,3,5,2,3,2,1]})
In [3]: data
Out[3]:
ID Tag Counts
0 A X 1
1 A 3
2 A X 5
3 A 2
4 B X 3
5 B 2
6 B 1
I want to create a new column with cum sum group by column=ID but restart sum if column Tag = X
In [6]: data['before'] = data.groupby(['ID']).Counts.cumsum()
In [7]: data['after'] = [1,4,5,7,3,5,6]
In [8]: data
Out[8]:
ID Tag Counts before after
0 A X 1 1 1
1 A 3 4 4
2 A X 5 9 5
3 A 2 11 7
4 B X 3 3 3
5 B 2 5 5
6 B 1 6 6
I want to get column 'after'
Upvotes: 3
Views: 599
Reputation: 150745
You can use .eq('X').cumsum()
to identify the group starting with X
, which you can use in groupby
together with 'ID'
:
data['after'] = data.groupby(['ID',data.Tag.eq('X').cumsum()])['Counts'].cumsum()
Output:
ID Tag Counts after
0 A X 1 1
1 A 3 4
2 A X 5 5
3 A 2 7
4 B X 3 3
5 B 2 5
6 B 1 6
Upvotes: 3