Reputation: 1445
I am trying to sum the number of categories for each month and cumulate the previous months. The group by needs to be by month,year, and category. I tried different ways of using cumsum but I just can't get it.
This is the df:
sample=[
{'name':11,'category':'A','year':2017,'month':1},
{'name':22,'category':'A','year':2017,'month':2},
{'name':33,'category':'B','year':2015,'month':1},
{'name':33,'category':'C','year':2017,'month':3},
{'name':33,'category':'B','year':2017,'month':8},
{'name':44,'category':'B','year':2016,'month':4},
{'name':44,'category':'A','year':2017,'month':6},
{'name':55,'category':'C','year':2016,'month':9},
{'name':55,'category':'C','year':2017,'month':5},
{'name':55,'category':'B','year':2017,'month':11}]
sample_df=pd.DataFrame(sample)
I grouped by month,year, and category and aggregated by month
sample_counts = sample_df.groupby(['month','year','category']).agg({
'category': 'count',
}).rename(columns={'category':'category_count'}).reset_index()
The output of sample_counts is this:
month year category category_count
1 2015 B 1
1 2017 A 1
2 2017 A 1
3 2017 C 1
4 2016 B 1
5 2017 C 1
6 2017 A 1
8 2017 B 1
9 2016 C 1
11 2017 B 1
So I want the output to add all the B's (for example) as the months progress
So by 11/2017 the 'B' sum should be 2.
I tried different variations of
cumul_df=sample_counts.groupby(['month','year','category']).sum().groupby(level=0).cumsum().reset_index()
I would like the FINAL output to look like this (only showing for 2017 below, but 2016 and 2018 would be the same logic and look and would be in the same table below):
Category Month Year Cumulative_By_Category
A 1 2017 1
A 2 2017 2
A 3 2017 2
A 4 2017 2
A 5 2017 2
A 6 2017 3
A 7 2017 3
A 8 2017 3
A 9 2017 3
A 10 2017 3
A 11 2017 3
A 12 2017 3
B 1 2017 0
B 2 2017 0
B 3 2017 0
B 4 2017 0
B 5 2017 0
B 6 2017 0
B 7 2017 0
B 8 2017 1
B 9 2017 1
B 10 2017 1
B 11 2017 2
B 12 2017 2
C 1 2017 0
C 2 2017 0
C 3 2017 1
C 4 2017 1
C 5 2017 2
C 6 2017 2
C 7 2017 2
C 8 2017 2
C 9 2017 2
C 10 2017 2
C 11 2017 2
C 12 2017 2
Upvotes: 2
Views: 517
Reputation: 150765
If I understand correctly, this is just cumcount
:
sample_df['cat_count'] = sample_df.groupby(['year','category']).cumcount() + 1
Output:
name category year month cat_count
0 11 A 2017 1 1
1 22 A 2017 2 2
2 33 B 2015 1 1
3 33 C 2017 3 1
4 33 B 2017 8 1
5 44 B 2016 4 1
6 44 A 2017 6 3
7 55 C 2016 9 1
8 55 C 2017 5 2
9 55 B 2017 11 2
Upvotes: 1