Sam Gregson
Sam Gregson

Reputation: 159

Binning a python pandas dataframe: extracting bin centers and the sum of another column

I am having a bit of trouble binning a pandas dataframe and then extracting the necessary plotting variables.

I have a pandas dataframe like the following:

            a    ad    td  price  result  profit_loss  
12935   10809 -1181  2363    262     1.0     616743.0  
13025  -18771   696  1390    350     1.0       1390.0  
13079  -20154   348   695    305     0.0        695.0  
13085    2370  3945  3150    264     0.0    -828450.0 

I want to bin the rows of the dataframe into equal sized bins of the "td" field (e.g. td = 0-100, 100-200, 200-300) and calculate the sum of all the profit_loss entries that fall into that td bin

e.g. here, for a td bin from 0-2000 the profit_loss would be 1390+695.

I then want to plot the td bin centers versus the profit_loss sums.

I have tried:

bins = np.linspace(df.td.min(), df.td.max(), 10)
groups = df.groupby(pd.cut(df.td, bins))

But I'm not sure how to extract the td bin centers and profit_loss sums created and plot them.

Thanks!

Upvotes: 1

Views: 1687

Answers (2)

pacholik
pacholik

Reputation: 8972

Add bin column to the dataframe and then groupby

A bin shall be td - td%binsize (+ binsize/2 if you want centers).

Then just groupby over bin and you can plot

>>> df['bin'] = df.td - df.td % 2000 + 1000
>>> s = df[['bin', 'profit_loss']].groupby('bin').sum()
>>> s
      profit_loss
bin              
1000       2085.0
3000    -211707.0
>>> s.plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot object at 0x7fce4fba3358>
>>> plt.show()

enter image description here

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

if you want to show all the empty bins on the X-axis - you can do it this way:

import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')

new = pd.DataFrame({'td':range(0, int(round(df.td.max() / 100) * 100) + 100, 100)})

(pd.merge(new, df.groupby(df.td//100*100)['profit_loss']
                 .sum().reset_index(),
          how='left')
   .fillna(0)
   .set_index('td')
   .plot.bar()
)
plt.axhline(0, color='k')

enter image description here

explanation:

a helper DF, containing all bins

In [68]: new
Out[68]:
      td
0      0
1    100
2    200
3    300
4    400
5    500
6    600
7    700
8    800
9    900
10  1000
11  1100
12  1200
13  1300
14  1400
15  1500
16  1600
17  1700
18  1800
19  1900
20  2000
21  2100
22  2200
23  2300
24  2400
25  2500
26  2600
27  2700
28  2800
29  2900
30  3000
31  3100
32  3200

grouped original DF

In [71]: df.groupby(df.td//100*100)['profit_loss'].sum().reset_index()
Out[71]:
     td  profit_loss
0   600        695.0
1  1300       1390.0
2  2300     616743.0
3  3100    -828450.0

merged / resulting DF

In [69]: (pd.merge(new, df.groupby(df.td//100*100)['profit_loss']
   ....:                  .sum().reset_index(),
   ....:           how='left')
   ....:    .fillna(0)
   ....: )
Out[69]:
      td  profit_loss
0      0          0.0
1    100          0.0
2    200          0.0
3    300          0.0
4    400          0.0
5    500          0.0
6    600        695.0
7    700          0.0
8    800          0.0
9    900          0.0
10  1000          0.0
11  1100          0.0
12  1200          0.0
13  1300       1390.0
14  1400          0.0
15  1500          0.0
16  1600          0.0
17  1700          0.0
18  1800          0.0
19  1900          0.0
20  2000          0.0
21  2100          0.0
22  2200          0.0
23  2300     616743.0
24  2400          0.0
25  2500          0.0
26  2600          0.0
27  2700          0.0
28  2800          0.0
29  2900          0.0
30  3000          0.0
31  3100    -828450.0
32  3200          0.0

Upvotes: 1

Related Questions