Milos Bijanic
Milos Bijanic

Reputation: 133

Pandas sum in time

I have data looks like:

ID  A   Date        B
1   5   2019-01-01  4
1   5   2019-01-03  6
1   5   2019-01-15  8
2   9   2019-01-02  2
2   9   2019-01-18  3

And i wont to be:

[Result column C]


ID  A   Date        B   **C**
1   5   2019-01-01  4   **9**
1   5   2019-01-03  6   **15**
1   5   2019-01-15  8   **23**
2   9   2019-01-02  2   **11**
2   9   2019-01-18  3   **14**

C[0] = A[0] + B[0] (9)

C[1] = A[1] + B[0] + B[1] (where B[0] + B[1] is C[1]=15)

C[0] = A[2] + B[0] + B[1] + B[2] (where B[0] + B[1] + B[2] is C[2]=23)

C[0] = A[3] + B[3] 

C[0] = A[4] + B[3] + B[4]

I solved by for loop, but on 2M row is unacceptable by time execution.

Does anyone have a better approach? Thanks to everyone who wants to help :)

Upvotes: 1

Views: 33

Answers (1)

Mayank Porwal
Mayank Porwal

Reputation: 34046

Use df.groupby with df.cumsum:

In [544]: df['C'] = df.groupby('ID').B.cumsum() + df.A

In [545]: df
Out[545]: 
   ID  A        Date  B   C
0   1  5  2019-01-01  4   9
1   1  5  2019-01-03  6  15
2   1  5  2019-01-15  8  23
3   2  9  2019-01-02  2  11
4   2  9  2019-01-18  3  14

Upvotes: 2

Related Questions