Reputation: 133
I have data looks like:
ID A Date B
1 5 2019-01-01 4
1 5 2019-01-03 6
1 5 2019-01-15 8
2 9 2019-01-02 2
2 9 2019-01-18 3
And i wont to be:
[Result column C]
ID A Date B **C**
1 5 2019-01-01 4 **9**
1 5 2019-01-03 6 **15**
1 5 2019-01-15 8 **23**
2 9 2019-01-02 2 **11**
2 9 2019-01-18 3 **14**
C[0] = A[0] + B[0] (9)
C[1] = A[1] + B[0] + B[1] (where B[0] + B[1] is C[1]=15)
C[0] = A[2] + B[0] + B[1] + B[2] (where B[0] + B[1] + B[2] is C[2]=23)
C[0] = A[3] + B[3]
C[0] = A[4] + B[3] + B[4]
I solved by for loop, but on 2M row is unacceptable by time execution.
Does anyone have a better approach? Thanks to everyone who wants to help :)
Upvotes: 1
Views: 33
Reputation: 34046
Use df.groupby
with df.cumsum
:
In [544]: df['C'] = df.groupby('ID').B.cumsum() + df.A
In [545]: df
Out[545]:
ID A Date B C
0 1 5 2019-01-01 4 9
1 1 5 2019-01-03 6 15
2 1 5 2019-01-15 8 23
3 2 9 2019-01-02 2 11
4 2 9 2019-01-18 3 14
Upvotes: 2