Reputation: 55
I have a pandas data frame 'train' as
Name Comb Sales
Joy A123 102
John A134 112
Aby A123 140
Amit A123 190
Andrew A134 210
Pren A123 109
Abry A134 230
Hulk A134 188
...
For each unique Comb I want to find the 25% quantiles of the corresponding sales and create respective bins. For example if you create 25% quantile bins for the sales of Comb='A123' you will get (102.00 107.25 124.50 152.50 190.00). Now I want to bin my sales for all Comb='A123' using these quantiles. My resulting data would be
Name Comb Sales Bin Bin_Low Bin_High
Joy A123 102 1 102 107.25
John A134 112 1 112 169
Aby A123 140 3 124.50 152.50
Amit A123 190 4 152.90 190
Andrew A134 210 3 199 215
Pren A123 109 2 107.25 124.50
Abry A134 230 4 215 230
Hulk A134 188 2 169 199
I created the below code but my final data frame does not come out in the right format.
quant = pd.DataFrame()
i = ''
for i in train.comb.unique():
a=pd.qcut(train[train.comb == i ]['Sales'], 4,duplicates='drop')
df = pd.DataFrame(np.array(a))
comp=pd.concat([train[train.combo == i ],df], axis=1)
quant=quant.append(comp)
Any help would be appreciated.
Upvotes: 1
Views: 2181
Reputation: 51395
You can use qcut
on your dataframe, grouped by the Comb
. Then, assign the left side to a Bin_low
column and the right to a Bin_max
. Note that the qcut has an open interval on the left
side, so the values will be a tiny bit off of your desired output, but essentially the same:
intervals = train.groupby('Comb')['Sales'].transform(pd.qcut, 4)
train['Bin_low'] = pd.IntervalIndex(intervals).left
train['Bin_high'] = pd.IntervalIndex(intervals).right
>>> train
Name Comb Sales Bin_low Bin_high
0 Joy A123 102 101.999 107.25
1 John A134 112 111.999 169.00
2 Aby A123 140 124.500 152.50
3 Amit A123 190 152.500 190.00
4 Andrew A134 210 199.000 215.00
5 Pren A123 109 107.250 124.50
6 Abry A134 230 215.000 230.00
7 Hulk A134 188 169.000 199.00
Upvotes: 1