Reputation: 2584
I have a data-set that looks like this:
Country m1 m2 m3
Canada 1 43 0.2
Canada 3 43 0.5
Canada 4 41 0.1
Canada 2 46 0.3
Sweden 4 46 0.4
Sweden 2 48 0.5
Sweden 3 39 0.5
France 5 43 0.1
France 2 48 0.1
France 3 49 0.9
I would like to make a histogram that bins m3 in say 5 bins or whatever is appropriate, and stacks that bin into the countries as well.
So the bin 0 - 0.1 would have a stacked bar that is 2/3 France and 1/3 Canada (represented by colors and then having a legend).
I have the following:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
x= df['m3']
num_bins = 5
plt.hist(x, num_bins, density=1, histtype='bar', stacked=True, label=df['Country'] )
plt.show()
But it is not stacking it at all. Think I am doing something wrong here...
Upvotes: 0
Views: 3537
Reputation: 12417
Another option could be:
df_plot = df.groupby(['m3', 'Country']).size().reset_index().pivot(columns='Country', index='m3', values=0)
df_plot.plot(kind='bar', stacked=True)
plt.show()
Upvotes: 0
Reputation: 862601
You can use crosstab
with cut
and plot by DataFrame.plot.bar
:
df = pd.crosstab(pd.cut(df['m3'], 5), df['Country'])
print (df)
Country Canada France Sweden
m3
(0.0992, 0.26] 2 2 0
(0.26, 0.42] 1 0 1
(0.42, 0.58] 1 0 2
(0.74, 0.9] 0 1 0
df.plot.bar(stacked=True)
Or use DataFrame.pivot
with DataFrame.plot.hist
:
df1 = df.pivot(columns='Country', values='m3')
print (df1)
Country Canada France Sweden
0 0.2 NaN NaN
1 0.5 NaN NaN
2 0.1 NaN NaN
3 0.3 NaN NaN
4 NaN NaN 0.4
5 NaN NaN 0.5
6 NaN NaN 0.5
7 NaN 0.1 NaN
8 NaN 0.1 NaN
9 NaN 0.9 NaN
df1.plot.hist(stacked=True, bins=5)
Upvotes: 1