Reputation: 407
I Have following dataframe df:
I want a cumulative sum of Volume
by PROD Cluster
which start doing fresh sum after end of the month, I want output to be as following:
Upvotes: 0
Views: 883
Reputation: 14113
Use groupby
and cumsum
:
# sample data
d1 = pd.DataFrame({'PROD Cluster': ['A']*3,
'Date': pd.date_range('2020-01-01', '2020-01-03'),
'Volume': [1,2,3]})
d2 = pd.DataFrame({'PROD Cluster': ['A']*3,
'Date': pd.date_range('2020-02-01', '2020-02-03'),
'Volume': [4,5,6]})
d3 = pd.DataFrame({'PROD Cluster': ['B']*3,
'Date': pd.date_range('2020-01-01', '2020-01-03'),
'Volume': [2,3,4]})
df = pd.concat([d1,d2, d3]).reset_index(drop=True)
# groupby cluster, month and year then cumsum the volumn of each group
# I grouped by year in case your data spans multiple years
df['cumsum'] = df.groupby(['PROD Cluster', df['Date'].dt.month,
df['Date'].dt.year])['Volume'].cumsum()
PROD Cluster Date Volume cumsum
0 A 2020-01-01 1 1
1 A 2020-01-02 2 3
2 A 2020-01-03 3 6
3 A 2020-02-01 4 4
4 A 2020-02-02 5 9
5 A 2020-02-03 6 15
6 B 2020-01-01 2 2
7 B 2020-01-02 3 5
8 B 2020-01-03 4 9
Upvotes: 2