using np cut and raise error bins must increase on python pandas

Question

I have a dataframe like this :

Code    Notes   Qty
AAD     2A 2K   17
AAD     2A 1K    1 
AAG     2A 3K   12 
AAG     4A 4K    2
AAG     5A 5K    1
ALA     1A 2K   10
ALA     1A 1K    3

And my aim is to find class for EACH code, meaning Qty with biggest 80% of sum is Class 1, and rest 15% is class 2, and rest class 3. So, AAD sum is 18, that means AAD with 2A 2K (its cumsum 16 which is in scope of 80% of 18+4) is class 1 and AAD with 2A 1K is class 2. And so on. Someone help me from my previous question to use pd.cut (I dont know if I am allowed to mention a username here). So I am trying to modified the code and apply to this new problem but the error says 'bins must increase monotically`. This is my code

df['cumsum']=df.groupby('Code')['Qty'].cumsum()
total = df.groupby('Code')['Qty'].sum()
bins = [total*0.15,total*0.05,total*0.8]
labels = ['class3','class2','class1']
df['class']=pd.cut(df['cumsum'], bins=bins, labels=labels, right=False)

this is result I want to achieve :

Code    Notes   Qty   Class
AAD     2A 2K   17    class1
AAD     2A 1K    1    class2
AAG     2A 3K   12    class1 
AAG     4A 4K    2    class2
AAG     5A 5K    1    class3
ALA     1A 2K   10    class1
ALA     1A 1K    3    class2

mozway · Accepted Answer

You can combine cut and groupby.transform:

df['class'] = pd.cut(df.groupby('Code')['Qty'].transform('sum').rdiv(df['Qty']), 
                     bins=[0, 0.15, 0.8, 1], labels=['class3', 'class2', 'class1'])

NB. if you want 80% of the total per group, this doesn't match the provided output. Maybe you want 80% of the maximum? In which can use transform('max').

Output:

  Code  Notes  Qty   class
0  AAD  2A 2K   17  class1
1  AAD  2A 1K    1  class3
2  AAG  2A 3K   12  class2
3  AAG  4A 4K    2  class3
4  AAG  5A 5K    1  class3
5  ALA  1A 2K   10  class2
6  ALA  1A 1K    3  class2

If you want to use a groupby.cumsum (which is not really what you describe):

df['class'] = pd.cut(df.groupby('Code')['Qty'].cumsum().rdiv(df['Qty']), 
                     bins=[0, 0.8, 1], labels=['class2', 'class1'])

Output:

  Code  Notes  Qty   class
0  AAD  2A 2K   14  class1
1  AAD  2A 1K    4  class2
2  AAG  2A 3K   12  class1
3  AAG  4A 4K   10  class2
4  AAG  5A 5K    4  class2
5  ALA  1A 2K   10  class1
6  ALA  1A 1K    8  class2

using np cut and raise error bins must increase on python pandas

Answers (2)

Related Questions