Reputation: 483
I have created a bar plot with percentages. However, since there's possibility of attrition I would like to include N, the number of observations or sample size (in brackets) as part of the bar labels. In other words, N should be the count of baseline and endline values.
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import pandas as pd
import numpy as np
data = {
'id': [1, 1, 2, 3, 3, 4, 4, 5, 6, 6, 7, 7, 8, 8, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15],
'survey': ['baseline', 'endline', 'baseline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', ],
'growth': [1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0]
}
df = pd.DataFrame(data)
sns.set_style('white')
ax = sns.barplot(data = df,
x = 'survey', y = 'growth',
estimator = lambda x: np.sum(x) / np.size(x) * 100, ci = None,
color = 'cornflowerblue')
ax.bar_label(ax.containers[0], fmt = '%.1f %%', fontsize = 20)
sns.despine(ax = ax, left = True)
ax.grid(True, axis = 'y')
ax.yaxis.set_major_formatter(PercentFormatter(100))
ax.set_xlabel('')
ax.set_ylabel('')
plt.tight_layout()
plt.show()
I will appreciate guidance on how to achieve this Thanks in advance!
Upvotes: 1
Views: 122
Reputation: 14184
One approach could be as follows.
df.groupby
on column survey
and calculate the number of observations for each survey by applying count
to column growth
.Series.to_numpy
to convert the resulting series to an array.N = df.groupby('survey', sort=False)['growth'].count().to_numpy()
# array([15, 12], dtype=int64), i.e. N for baseline == 15, for endline == 12
ax.bar_labels
, we can now create custom labels
, combining the values from the array N
with the values that are already stored in ax.containers[0].datavalues
, which is itself a similar array (i.e. array([60. , 41.66666667]
). So, we can do something as follows:# Turn `N` into italic type
N_it = '$\it{N}$'
# Use list comprehension with `zip` to loop through `datavalues` and `N`
# simultaneously and use `f-strings` to produce the custom strings
labels=[f'{np.round(perc,1)}% ({N_it} = {n})'
for perc, n in zip(ax.containers[0].datavalues, N)]
# Pass `labels` to `ax.bar_label`
ax.bar_label(ax.containers[0], labels = labels, fontsize = 20)
So, we can include this in the code snippet that you have provided as follows:
ax = sns.barplot(...)
# ---- start
# ax.bar_label(ax.containers[0], fmt = '%.1f %%', fontsize = 20)
N = df.groupby('survey', sort=False)['growth'].count().to_numpy()
N_it = '$\it{N}$'
labels=[f'{np.round(perc,1)}% ({N_it} = {n})'
for perc, n in zip(ax.containers[0].datavalues, N)]
ax.bar_label(ax.containers[0], labels = labels, fontsize = 20)
# ---- end
sns.despine(ax = ax, left = True)
Result
On the italic type, see this SO post
.
Upvotes: 1