sci-guy
sci-guy

Reputation: 2584

Stacked histogram in Pandas Python

I have a data-set that looks like this:

Country m1  m2  m3
Canada  1   43  0.2
Canada  3   43  0.5
Canada  4   41  0.1
Canada  2   46  0.3
Sweden  4   46  0.4
Sweden  2   48  0.5
Sweden  3   39  0.5
France  5   43  0.1
France  2   48  0.1
France  3   49  0.9

I would like to make a histogram that bins m3 in say 5 bins or whatever is appropriate, and stacks that bin into the countries as well.

So the bin 0 - 0.1 would have a stacked bar that is 2/3 France and 1/3 Canada (represented by colors and then having a legend).

I have the following:

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
x= df['m3']
num_bins = 5
plt.hist(x, num_bins, density=1, histtype='bar', stacked=True, label=df['Country'] )
plt.show()

But it is not stacking it at all. Think I am doing something wrong here...

Upvotes: 0

Views: 3537

Answers (2)

Joe
Joe

Reputation: 12417

Another option could be:

df_plot = df.groupby(['m3', 'Country']).size().reset_index().pivot(columns='Country', index='m3', values=0)
df_plot.plot(kind='bar', stacked=True)
plt.show()

enter image description here

Upvotes: 0

jezrael
jezrael

Reputation: 862601

You can use crosstab with cut and plot by DataFrame.plot.bar:

df = pd.crosstab(pd.cut(df['m3'], 5), df['Country'])
print (df)
Country         Canada  France  Sweden
m3                                    
(0.0992, 0.26]       2       2       0
(0.26, 0.42]         1       0       1
(0.42, 0.58]         1       0       2
(0.74, 0.9]          0       1       0

df.plot.bar(stacked=True)

graph

Or use DataFrame.pivot with DataFrame.plot.hist:

df1 = df.pivot(columns='Country', values='m3')
print (df1)
Country  Canada  France  Sweden
0           0.2     NaN     NaN
1           0.5     NaN     NaN
2           0.1     NaN     NaN
3           0.3     NaN     NaN
4           NaN     NaN     0.4
5           NaN     NaN     0.5
6           NaN     NaN     0.5
7           NaN     0.1     NaN
8           NaN     0.1     NaN
9           NaN     0.9     NaN

df1.plot.hist(stacked=True, bins=5)

graph2

Upvotes: 1

Related Questions