Reputation: 820
My categorical variable case_status
takes on four unique values. I have data from 2014 to 2016. I would like to plot the distribution of case_status
grouped by year. I try the following:
df.groupby('year').case_status.value_counts().plot.barh()
And I get the following plot:
However, I want the following plot:
Upvotes: 36
Views: 73034
Reputation: 23449
Another way to plot bar plots grouped by year is to use pivot_table()
instead; pass the column that becomes the x-axis label to index=
and the grouper to columns=
and plot the size
. Note that since you can pass any function to aggfunc=
, it is more general than value_counts()
; with pivot_table
, we can plot e.g. mean
, sum
, etc.
df = pd.DataFrame({'year': np.random.choice([2014, 2015, 2016], size=3000), 'case_status': [*['Certified']*2500, *['Certified-Withdrawn']*300, *['Withdrawn']*100, *['Denied']*100]})
df.pivot_table(index='case_status', columns='year', aggfunc='size').plot.barh();
# ^^^^^^^^^^^ pivot_table call here ^^^^ barplot call here
If the x-ticklabels have to be sorted in some order, then (given they come from the dataframe index) you can sort the index before the plotting by using loc[]
.
Let's say, you want the data sorted in the index_order
below. Then you can sort the index by passing the reverse of this order to loc
and call plot
.
index_order = ['Certified', 'Certified-Withdrawn', 'Withdrawn', 'Denied']
df.pivot_table(index='case_status', columns='year', aggfunc='size').loc[reversed(index_order)].plot.barh()
Upvotes: 1