Reputation: 159
I am having a bit of trouble binning a pandas dataframe and then extracting the necessary plotting variables.
I have a pandas dataframe like the following:
a ad td price result profit_loss
12935 10809 -1181 2363 262 1.0 616743.0
13025 -18771 696 1390 350 1.0 1390.0
13079 -20154 348 695 305 0.0 695.0
13085 2370 3945 3150 264 0.0 -828450.0
I want to bin the rows of the dataframe into equal sized bins of the "td" field (e.g. td = 0-100, 100-200, 200-300) and calculate the sum of all the profit_loss entries that fall into that td bin
e.g. here, for a td bin from 0-2000 the profit_loss would be 1390+695.
I then want to plot the td bin centers versus the profit_loss sums.
I have tried:
bins = np.linspace(df.td.min(), df.td.max(), 10)
groups = df.groupby(pd.cut(df.td, bins))
But I'm not sure how to extract the td bin centers and profit_loss sums created and plot them.
Thanks!
Upvotes: 1
Views: 1687
Reputation: 8972
A bin shall be td - td%binsize
(+ binsize/2
if you want centers).
Then just groupby over bin and you can plot
>>> df['bin'] = df.td - df.td % 2000 + 1000
>>> s = df[['bin', 'profit_loss']].groupby('bin').sum()
>>> s
profit_loss
bin
1000 2085.0
3000 -211707.0
>>> s.plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot object at 0x7fce4fba3358>
>>> plt.show()
Upvotes: 1
Reputation: 210832
if you want to show all the empty bins on the X-axis - you can do it this way:
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
new = pd.DataFrame({'td':range(0, int(round(df.td.max() / 100) * 100) + 100, 100)})
(pd.merge(new, df.groupby(df.td//100*100)['profit_loss']
.sum().reset_index(),
how='left')
.fillna(0)
.set_index('td')
.plot.bar()
)
plt.axhline(0, color='k')
explanation:
a helper DF, containing all bins
In [68]: new
Out[68]:
td
0 0
1 100
2 200
3 300
4 400
5 500
6 600
7 700
8 800
9 900
10 1000
11 1100
12 1200
13 1300
14 1400
15 1500
16 1600
17 1700
18 1800
19 1900
20 2000
21 2100
22 2200
23 2300
24 2400
25 2500
26 2600
27 2700
28 2800
29 2900
30 3000
31 3100
32 3200
grouped original DF
In [71]: df.groupby(df.td//100*100)['profit_loss'].sum().reset_index()
Out[71]:
td profit_loss
0 600 695.0
1 1300 1390.0
2 2300 616743.0
3 3100 -828450.0
merged / resulting DF
In [69]: (pd.merge(new, df.groupby(df.td//100*100)['profit_loss']
....: .sum().reset_index(),
....: how='left')
....: .fillna(0)
....: )
Out[69]:
td profit_loss
0 0 0.0
1 100 0.0
2 200 0.0
3 300 0.0
4 400 0.0
5 500 0.0
6 600 695.0
7 700 0.0
8 800 0.0
9 900 0.0
10 1000 0.0
11 1100 0.0
12 1200 0.0
13 1300 1390.0
14 1400 0.0
15 1500 0.0
16 1600 0.0
17 1700 0.0
18 1800 0.0
19 1900 0.0
20 2000 0.0
21 2100 0.0
22 2200 0.0
23 2300 616743.0
24 2400 0.0
25 2500 0.0
26 2600 0.0
27 2700 0.0
28 2800 0.0
29 2900 0.0
30 3000 0.0
31 3100 -828450.0
32 3200 0.0
Upvotes: 1