Include both % and N as bar labels

Question

I have created a bar plot with percentages. However, since there's possibility of attrition I would like to include N, the number of observations or sample size (in brackets) as part of the bar labels. In other words, N should be the count of baseline and endline values.

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import pandas as pd
import numpy as np

data = {
'id': [1, 1, 2, 3, 3, 4, 4, 5, 6, 6, 7, 7, 8, 8, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15],
'survey': ['baseline', 'endline', 'baseline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', ],
'growth': [1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0]
}

df = pd.DataFrame(data)

sns.set_style('white')
ax = sns.barplot(data = df,
                 x = 'survey', y = 'growth',
                 estimator = lambda x: np.sum(x) / np.size(x) * 100, ci = None,
                 color = 'cornflowerblue')
ax.bar_label(ax.containers[0], fmt = '%.1f %%', fontsize = 20)

sns.despine(ax = ax, left = True)
ax.grid(True, axis = 'y')
ax.yaxis.set_major_formatter(PercentFormatter(100))
ax.set_xlabel('')
ax.set_ylabel('')
plt.tight_layout()
plt.show()

I will appreciate guidance on how to achieve this Thanks in advance!

ouroboros1 · Accepted Answer

One approach could be as follows.

First, use df.groupby on column survey and calculate the number of observations for each survey by applying count to column growth.
Chain Series.to_numpy to convert the resulting series to an array.

N = df.groupby('survey', sort=False)['growth'].count().to_numpy()
# array([15, 12], dtype=int64), i.e. N for baseline == 15, for endline == 12

Within ax.bar_labels, we can now create custom labels, combining the values from the array N with the values that are already stored in ax.containers[0].datavalues, which is itself a similar array (i.e. array([60. , 41.66666667]). So, we can do something as follows:

# Turn `N` into italic type
N_it = '$\it{N}$'

# Use list comprehension with `zip` to loop through `datavalues` and `N`
# simultaneously and use `f-strings` to produce the custom strings
labels=[f'{np.round(perc,1)}% ({N_it} = {n})' 
        for perc, n in zip(ax.containers[0].datavalues, N)]

# Pass `labels` to `ax.bar_label` 
ax.bar_label(ax.containers[0], labels = labels, fontsize = 20)

So, we can include this in the code snippet that you have provided as follows:

ax = sns.barplot(...)

# ---- start
# ax.bar_label(ax.containers[0], fmt = '%.1f %%', fontsize = 20)

N = df.groupby('survey', sort=False)['growth'].count().to_numpy()
N_it = '$\it{N}$'
labels=[f'{np.round(perc,1)}% ({N_it} = {n})' 
        for perc, n in zip(ax.containers[0].datavalues, N)]

ax.bar_label(ax.containers[0], labels = labels, fontsize = 20)
# ---- end

sns.despine(ax = ax, left = True)

Result

On the italic type, see this SO post.

Include both % and N as bar labels

Answers (1)

Related Questions