Binning a pandas column based on quantiles

Question

I have a pandas data frame 'train' as

Name   Comb   Sales
Joy     A123   102
John    A134   112
Aby     A123   140
Amit    A123   190
Andrew  A134   210
Pren    A123   109
Abry    A134   230
Hulk    A134   188  
...

For each unique Comb I want to find the 25% quantiles of the corresponding sales and create respective bins. For example if you create 25% quantile bins for the sales of Comb='A123' you will get (102.00 107.25 124.50 152.50 190.00). Now I want to bin my sales for all Comb='A123' using these quantiles. My resulting data would be

Name   Comb   Sales  Bin  Bin_Low  Bin_High
Joy     A123   102    1    102     107.25
John    A134   112    1    112     169
Aby     A123   140    3    124.50  152.50
Amit    A123   190    4    152.90  190
Andrew  A134   210    3    199     215
Pren    A123   109    2    107.25  124.50
Abry    A134   230    4    215     230
Hulk    A134   188    2    169     199

I created the below code but my final data frame does not come out in the right format.

     quant = pd.DataFrame()
     i = ''
     for  i  in train.comb.unique():    
     a=pd.qcut(train[train.comb == i ]['Sales'], 4,duplicates='drop')
     df = pd.DataFrame(np.array(a))
     comp=pd.concat([train[train.combo == i ],df], axis=1)
     quant=quant.append(comp)

Any help would be appreciated.

sacuL · Accepted Answer

You can use qcut on your dataframe, grouped by the Comb. Then, assign the left side to a Bin_low column and the right to a Bin_max. Note that the qcut has an open interval on the left side, so the values will be a tiny bit off of your desired output, but essentially the same:

intervals = train.groupby('Comb')['Sales'].transform(pd.qcut, 4)

train['Bin_low'] = pd.IntervalIndex(intervals).left
train['Bin_high'] = pd.IntervalIndex(intervals).right

>>> train
     Name  Comb  Sales  Bin_low  Bin_high
0     Joy  A123    102  101.999    107.25
1    John  A134    112  111.999    169.00
2     Aby  A123    140  124.500    152.50
3    Amit  A123    190  152.500    190.00
4  Andrew  A134    210  199.000    215.00
5    Pren  A123    109  107.250    124.50
6    Abry  A134    230  215.000    230.00
7    Hulk  A134    188  169.000    199.00

Binning a pandas column based on quantiles

Answers (1)

Related Questions