Reputation: 283
I have a dataframe like this :
Code Notes Qty
AAD 2A 2K 17
AAD 2A 1K 1
AAG 2A 3K 12
AAG 4A 4K 2
AAG 5A 5K 1
ALA 1A 2K 10
ALA 1A 1K 3
And my aim is to find class for EACH code, meaning Qty with biggest 80% of sum is Class 1, and rest 15% is class 2, and rest class 3. So, AAD sum is 18, that means AAD with 2A 2K (its cumsum 16 which is in scope of 80% of 18+4) is class 1 and AAD with 2A 1K is class 2. And so on. Someone help me from my previous question to use pd.cut (I dont know if I am allowed to mention a username here). So I am trying to modified the code and apply to this new problem but the error says 'bins must increase monotically`. This is my code
df['cumsum']=df.groupby('Code')['Qty'].cumsum()
total = df.groupby('Code')['Qty'].sum()
bins = [total*0.15,total*0.05,total*0.8]
labels = ['class3','class2','class1']
df['class']=pd.cut(df['cumsum'], bins=bins, labels=labels, right=False)
this is result I want to achieve :
Code Notes Qty Class
AAD 2A 2K 17 class1
AAD 2A 1K 1 class2
AAG 2A 3K 12 class1
AAG 4A 4K 2 class2
AAG 5A 5K 1 class3
ALA 1A 2K 10 class1
ALA 1A 1K 3 class2
Upvotes: 1
Views: 88
Reputation: 260455
You can combine cut
and groupby.transform
:
df['class'] = pd.cut(df.groupby('Code')['Qty'].transform('sum').rdiv(df['Qty']),
bins=[0, 0.15, 0.8, 1], labels=['class3', 'class2', 'class1'])
NB. if you want 80% of the total per group, this doesn't match the provided output. Maybe you want 80% of the maximum? In which can use transform('max')
.
Output:
Code Notes Qty class
0 AAD 2A 2K 17 class1
1 AAD 2A 1K 1 class3
2 AAG 2A 3K 12 class2
3 AAG 4A 4K 2 class3
4 AAG 5A 5K 1 class3
5 ALA 1A 2K 10 class2
6 ALA 1A 1K 3 class2
If you want to use a groupby.cumsum
(which is not really what you describe):
df['class'] = pd.cut(df.groupby('Code')['Qty'].cumsum().rdiv(df['Qty']),
bins=[0, 0.8, 1], labels=['class2', 'class1'])
Output:
Code Notes Qty class
0 AAD 2A 2K 14 class1
1 AAD 2A 1K 4 class2
2 AAG 2A 3K 12 class1
3 AAG 4A 4K 10 class2
4 AAG 5A 5K 4 class2
5 ALA 1A 2K 10 class1
6 ALA 1A 1K 8 class2
Upvotes: 4
Reputation: 11474
This is what I think you want to do:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Code': ['AAD', 'AAD', 'AAG', 'AAG', 'AAG', 'ALA', 'ALA'],
'Notes': ['2A 2K', '2A 1K', '2A 3K', '4A 4K', '5A 5K', '1A 2K', '1A 1K'],
'Qty': [14, 4, 12, 10, 1, 10, 8]
})
df['cumsum'] = df.groupby('Code')['Qty'].cumsum()
df['total'] = df.groupby('Code')['Qty'].transform('sum')
df['class'] = np.where(df['cumsum'] <= df['total']*0.8, 'class1',
np.where(df['cumsum'] <= df['total']*0.95, 'class3', 'class2'))
df = df.drop(['cumsum', 'total'], axis=1)
print(df)
which returns:
Code Notes Qty class
0 AAD 2A 2K 14 class1
1 AAD 2A 1K 4 class2
2 AAG 2A 3K 12 class1
3 AAG 4A 4K 10 class2
4 AAG 5A 5K 1 class2
5 ALA 1A 2K 10 class1
6 ALA 1A 1K 8 class2
Upvotes: 1