Bryce Ramgovind
Bryce Ramgovind

Reputation: 3257

Cumulative Sum by date (Month)

I have a pandas dataframe and I need to work out the cumulative sum for each month.

Date    Amount
2017/01/12  50
2017/01/12  30
2017/01/15  70
2017/01/23  80
2017/02/01  90
2017/02/01  10
2017/02/02  10
2017/02/03  10
2017/02/03  20
2017/02/04  60
2017/02/04  90
2017/02/04  100

The cumulative sum is the trailing sum for each day i.e 01-31. However, some days are missing. The data frame should look like

Date    Sum_Amount
2017/01/12  80
2017/01/15  150
2017/01/23  203
2017/02/01  100
2017/02/02  110
2017/02/03  140
2017/02/04  390

Upvotes: 1

Views: 16695

Answers (1)

jezrael
jezrael

Reputation: 862581

You can use if only need cumsum by months groupby with sum and then group by values of index converted to month:

df.Date = pd.to_datetime(df.Date)
df = df.groupby('Date').Amount.sum()
df = df.groupby(df.index.month).cumsum().reset_index()
print (df)

        Date  Amount
0 2017-01-12      80
1 2017-01-15     150
2 2017-01-23     230
3 2017-02-01     100
4 2017-02-02     110
5 2017-02-03     140
6 2017-02-04     390

But if need but months and years need convert to month period by to_period:

df = df.groupby(df.index.to_period('m')).cumsum().reset_index()

Difference is better seen in changed df - added different year:

print (df)
          Date  Amount
0   2017/01/12      50
1   2017/01/12      30
2   2017/01/15      70
3   2017/01/23      80
4   2017/02/01      90
5   2017/02/01      10
6   2017/02/02      10
7   2017/02/03      10
8   2018/02/03      20
9   2018/02/04      60
10  2018/02/04      90
11  2018/02/04     100

df.Date = pd.to_datetime(df.Date)
df = df.groupby('Date').Amount.sum()
df = df.groupby(df.index.month).cumsum().reset_index()
print (df)
        Date  Amount
0 2017-01-12      80
1 2017-01-15     150
2 2017-01-23     230
3 2017-02-01     100
4 2017-02-02     110
5 2017-02-03     120
6 2018-02-03     140
7 2018-02-04     390

df.Date = pd.to_datetime(df.Date)
df = df.groupby('Date').Amount.sum()
df = df.groupby(df.index.to_period('m')).cumsum().reset_index()
print (df)
        Date  Amount
0 2017-01-12      80
1 2017-01-15     150
2 2017-01-23     230
3 2017-02-01     100
4 2017-02-02     110
5 2017-02-03     120
6 2018-02-03      20
7 2018-02-04     270

Upvotes: 6

Related Questions