Nikhil Mangire
Nikhil Mangire

Reputation: 407

How to calculate cumulative sum over days in a month? using pandas for time series analysis

I Have following dataframe df:

enter image description here

I want a cumulative sum of Volume by PROD Cluster which start doing fresh sum after end of the month, I want output to be as following:

enter image description here

Upvotes: 0

Views: 883

Answers (1)

It_is_Chris
It_is_Chris

Reputation: 14113

Use groupby and cumsum:

# sample data
d1 = pd.DataFrame({'PROD Cluster': ['A']*3,
                   'Date': pd.date_range('2020-01-01', '2020-01-03'),
                   'Volume': [1,2,3]})
d2 = pd.DataFrame({'PROD Cluster': ['A']*3,
                   'Date': pd.date_range('2020-02-01', '2020-02-03'),
                   'Volume': [4,5,6]})
d3 = pd.DataFrame({'PROD Cluster': ['B']*3,
                   'Date': pd.date_range('2020-01-01', '2020-01-03'),
                   'Volume': [2,3,4]})
df = pd.concat([d1,d2, d3]).reset_index(drop=True)

# groupby cluster, month and year then cumsum the volumn of each group
# I grouped by year in case your data spans multiple years
df['cumsum'] = df.groupby(['PROD Cluster', df['Date'].dt.month,
                           df['Date'].dt.year])['Volume'].cumsum()

  PROD Cluster       Date  Volume  cumsum
0            A 2020-01-01       1       1
1            A 2020-01-02       2       3
2            A 2020-01-03       3       6
3            A 2020-02-01       4       4
4            A 2020-02-02       5       9
5            A 2020-02-03       6      15
6            B 2020-01-01       2       2
7            B 2020-01-02       3       5
8            B 2020-01-03       4       9

Upvotes: 2

Related Questions