Agustina
Agustina

Reputation: 101

Create 100% stacked bar chart

I need to generate a 100% stacked bar chart, including the % of the distribution (with no decimals) or the number of observations.

My dataset looks like this:

enter image description here

I need to generate a different one that counts the amount of actives and lates per month:

enter image description here

And then use this second dataframe to generate my 100% stacked bar chart (should look something like this)

enter image description here

Does anybody have an easy way of doing this?

Thanks!!

Upvotes: 4

Views: 22750

Answers (3)

Federico
Federico

Reputation: 781

You can use the code below to generate the following chart. Please also consider reading the answer till the end where I explain why a horizontal bar chart may be better.

enter image description here

My dataset looks like this:

thermal_sensation_round thermal_preference
0 2 cooler
1 2 cooler
2 0 no change
3 0 no change
4 1 warmer

I used the following code to generate the plot. In the code I am doing the following steps:

  1. grouping the data, counting the entries, and normalizing them
  2. plotting the data using Pandas' function .plot.bar(stacked=True)
  3. placing the legend at the top
  4. using the for loop to add the formatted text to the chart. Please note that I am not printing the percentage if it is lower than 10%, you can change that.
  5. using tight_layout() to center the image.
    x_var, y_var = "thermal_sensation_round", "thermal_preference"
    df_grouped = df.groupby(x_var)[y_var].value_counts(normalize=True).unstack(y_var)
    df_grouped.plot.bar(stacked=True)
    plt.legend(
        bbox_to_anchor=(0.5, 1.02),
        loc="lower center",
        borderaxespad=0,
        frameon=False,
        ncol=3,
    )
    for ix, row in df_grouped.reset_index(drop=True).iterrows():
        cumulative = 0
        for element in row:
            if element == element and element > 0.1:
                plt.text(
                    ix,
                    cumulative + element / 2,
                    f"{int(element * 100)} %",
                    va="center",
                    ha="center",
                )
            cumulative += element
    plt.tight_layout()

Horizontal stacked bar plot

Using a horizontal bar plot is a better idea since it is going to be easier to read the percentages. See example below.

enter image description here

To do that is very simple, you just need to replace the bar function with barh. Please note that you will need to invert the x and y coordinates in the text function. Please find the code below.

    x_var, y_var = "thermal_sensation_round", "thermal_preference"
    df_grouped = df.groupby(x_var)[y_var].value_counts(normalize=True).unstack(y_var)
    df_grouped.plot.barh(stacked=True)
    plt.legend(
        bbox_to_anchor=(0.5, 1.02),
        loc="lower center",
        borderaxespad=0,
        frameon=False,
        ncol=3,
    )
    for ix, row in df_grouped.reset_index(drop=True).iterrows():
        print(ix, row)
        cumulative = 0
        for element in row:
            if element == element and element > 0.1:
                plt.text(
                    cumulative + element / 2,
                    ix,
                    f"{int(element * 100)} %",
                    va="center",
                    ha="center",
                )
            cumulative += element
    plt.tight_layout()

Upvotes: 6

Arturo Moncada-Torres
Arturo Moncada-Torres

Reputation: 1345

Quang Hoang's answer works great. However, addressing Augustina's comment on how to further modify the plot:

The way I do it, is by using axes (ax). First, you create your fig and ax:

fig, ax = plt.subplots(1, 1, figsize=[10, 5])

Then, you perform your grouping:

x = 'Date'
y = 'Status'
df_grouped = df.groupby(x)[y].value_counts(normalize=True).unstack(y)

After that, you generate your plot. Notice that we define in which ax to plot by specifying ax=ax. Moreover, see how we can define the colormap already here (remember that you need to from matplotlib import cm beforehand, though) or the column width.

df_grouped.plot.bar(stacked=True, cmap=cm.get_cmap('viridis'), width=0.75, ax=ax)

After that, you can use ax to do all the adjustments that you want. For instance add a legend...

ax.legend(bbox_to_anchor=(1.04, 0.95), title='Thermal Preference', loc="upper left", frameon=False)

...set your xlabel...

ax.set_xlabel("Date")

...and so on and so forth. Of course, on top of that, you can add the labels as suggested by Federico. However, to keep it consistent, I would replace plt.text with ax.text.

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150785

You can try value_counts() with normalize:

(df.groupby('Date')['Status'].value_counts(normalize=True)
   .unstack('Status').plot.bar(stacked=True)
)

Upvotes: 12

Related Questions