Reputation: 81
So I have a dataframe like this
df = pd.DataFrame({'item_id':[1,2,3,4,5,6,7,8,9,10], 'category':['A', 'B', 'A', 'C', 'B', 'B', 'C', 'A', 'A', 'C'], 'sales': [100, 150, 300, 1000, 300, 50, 1000, 600, 700, 100]})
item_id category sales
0 1 A 100
1 2 B 150
2 3 A 300
3 4 C 1000
4 5 B 300
5 6 B 50
6 7 C 1000
7 8 A 600
8 9 A 700
9 10 C 100
and I want the cumulutative percent of total sales of every item, from most sold to least sold. Like this:
df = df.sort_values(by = 'sales', ascending = False)
df['pct_of_total'] = df['sales']/df['sales'].sum()
df['cumsum_pct_of_total'] = df['pct_of_total'].cumsum()
item_id category sales pct_of_total cumsum_pct_of_total
3 4 C 1000 0.232558 0.232558
6 7 C 1000 0.232558 0.465116
8 9 A 700 0.162791 0.627907
7 8 A 600 0.139535 0.767442
2 3 A 300 0.069767 0.837209
4 5 B 300 0.069767 0.906977
1 2 B 150 0.034884 0.941860
0 1 A 100 0.023256 0.965116
9 10 C 100 0.023256 0.988372
5 6 B 50 0.011628 1.000000
But the catch is that I want to this process not to the whole dataframe, but within each category. I tried a custom function:
def acc_pct(s):
s = s.sort_values(ascending = False)
s = s/s.sum()
s = s.cumsum()
return s.sort_index()
df.groupby('category').agg({'sales':acc_pct})
But it didn't work. It throws a ValueError: Must produce aggregated value
.
I know it has to be possible, because groupby.cumcount(), groupby.cumsum() e groupby.shift() works much like this. How do I do it?
Upvotes: 1
Views: 609
Reputation: 35646
Try dividing by groupby transform
sum to get pct_of_total
then groupby cumsum
the new column:
df = df.sort_values('sales', ascending=False)
df['pct_of_total'] = (
df['sales'] / df.groupby('category')['sales'].transform('sum')
)
df['cumsum_pct_of_total'] = df.groupby('category')['pct_of_total'].cumsum()
df
:
item_id category sales pct_of_total cumsum_pct_of_total
3 4 C 1000 0.476190 0.476190
6 7 C 1000 0.476190 0.952381
8 9 A 700 0.411765 0.411765
7 8 A 600 0.352941 0.764706
2 3 A 300 0.176471 0.941176
4 5 B 300 0.600000 0.600000
1 2 B 150 0.300000 0.900000
0 1 A 100 0.058824 1.000000
9 10 C 100 0.047619 1.000000
5 6 B 50 0.100000 1.000000
Upvotes: 1