Perform Aggregation on one column based on the bins specified for value_counts for another one

Question

Suppose I have a Pandas dataframe with two columns viz. df.Price and df.Revenue. I can perform df.Price.value_counts(bins=[5,10,15]) and determine the number of prices within each of the 3 bins.

However, I would like to know what the total Revenue was within those bins, which I specified for the Price column. How do I achieve this? An extension to that would be finding out the number of transactions (count) for each of these bins?

BallpointBen · Accepted Answer

Use pd.cut to create a dummy column containing the bucketing, then group on that.

>>> df = pd.DataFrame({'Price': np.random.randint(0,20,(10,)), 
                       'Revenue': np.random.rand(10)})
>>> df
   Price   Revenue
0      0  0.104462
1      9  0.976338
2      7  0.800895
3     13  0.700494
4     13  0.241352
5      0  0.535348
6     13  0.811419
7     17  0.508165
8     13  0.580809
9      5  0.711055

>>> df['Bucket'] = pd.cut(df['Price'], [-float('inf'), 5, 10, 15, float('inf')])
>>> df
   Price   Revenue        Bucket
0      0  0.104462   (-inf, 5.0]
1      9  0.976338   (5.0, 10.0]
2      7  0.800895   (5.0, 10.0]
3     13  0.700494  (10.0, 15.0]
4     13  0.241352  (10.0, 15.0]
5      0  0.535348   (-inf, 5.0]
6     13  0.811419  (10.0, 15.0]
7     17  0.508165   (15.0, inf]
8     13  0.580809  (10.0, 15.0]
9      5  0.711055   (-inf, 5.0]

>>> df.groupby('Bucket').sum()
              Price   Revenue
Bucket                       
(-inf, 5.0]       5  1.350865
(5.0, 10.0]      16  1.777233
(10.0, 15.0]     52  2.334075
(15.0, inf]      17  0.508165

>>> df.groupby('Bucket')['Revenue']
      .agg(['count', 'sum'])
      .rename(columns={'sum': 'Net Revenue'})                                                              
              count  Net Revenue
Bucket                          
(-inf, 5.0]       3     2.266008
(5.0, 10.0]       3     1.477182
(10.0, 15.0]      1     0.432358
(15.0, inf]       3     2.097361

Perform Aggregation on one column based on the bins specified for value_counts for another one

Answers (1)

Related Questions