Reputation: 169
I only found the code can put median in boxplot and I tried it. But since my boxplot is multiple, so it unable to get the x-tick get locator. How can I find the minor tick locator of the boxplot, I already tried it yet still cannot get the location of multiple boxplot location. Any suggestion to improve this plot?
df = pd.DataFrame([['Apple', 10, 'A'],['Apple', 8, 'B'],['Apple', 10, 'C'],
['Apple', 5, 'A'],['Apple', 7, 'B'],['Apple', 9, 'C'],
['Apple', 3, 'A'],['Apple', 5, 'B'],['Apple', 4, 'C'],
['Orange', 3, 'A'],['Orange', 4, 'B'],['Orange', 6, 'C'],
['Orange', 2, 'A'],['Orange', 8, 'B'],['Orange', 4, 'C'],
['Orange', 8, 'A'],['Orange', 10, 'B'],['Orange', 1, 'C']])
df.columns = ['item', 'score', 'grade']
fig = plt.figure(figsize=(6, 3), dpi=150)
ax = sns.boxplot(x='item', y='score', data=df, hue='grade', palette=sns.color_palette('husl'))
ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')
medians = df.groupby(['item','grade'])['score'].median().values
median_labels = [str(np.round(s, 2)) for s in medians]
pos = range(len(medians))
for tick,label in zip(pos, ax.get_xticklabels()):
ax.text(pos[tick], medians[tick], median_labels[tick],
horizontalalignment='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))
Upvotes: 1
Views: 673
Reputation: 40667
Seaborn is notoriously difficult to work with. The code below works but might break if one of the category is empty and no boxplot is drawn for example, use at your own risks:
df = pd.DataFrame([['Apple', 10, 'A'],['Apple', 8, 'B'],['Apple', 10, 'C'],
['Apple', 5, 'A'],['Apple', 7, 'B'],['Apple', 9, 'C'],
['Apple', 3, 'A'],['Apple', 5, 'B'],['Apple', 4, 'C'],
['Orange', 3, 'A'],['Orange', 4, 'B'],['Orange', 6, 'C'],
['Orange', 2, 'A'],['Orange', 8, 'B'],['Orange', 4, 'C'],
['Orange', 8, 'A'],['Orange', 10, 'B'],['Orange', 1, 'C']])
df.columns = ['item', 'score', 'grade']
width = 0.8
hue_col = 'grade'
fig, plt.figure(figsize=(6, 3), dpi=150)
ax = sns.boxplot(x='item', y='score', data=df, hue=hue_col, palette=sns.color_palette('husl'), width=width)
ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')
# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(df[hue_col].unique())
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()
medians = df.groupby(['item','grade'])['score'].median()
for x0,(_,med0) in enumerate(medians.groupby(level=0)):
for off,(_,med1) in zip(offsets,med0.groupby(level=1)):
ax.text(x0+off, med1.item(), '{:.0f}'.format(med1.item()),
horizontalalignment='center', va='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))
In general, to avoid any surpises, if you want to modify a seaborn plot, I would recommend you specify order
and hue_order
so that the plot is drawn in a pre-determined order. Here is an other version that is able to deal with a missing category:
df = pd.DataFrame([['Apple', 8, 'B'],['Apple', 10, 'C'],
['Apple', 7, 'B'],['Apple', 9, 'C'],
['Apple', 5, 'B'],['Apple', 4, 'C'],
['Orange', 3, 'A'],['Orange', 6, 'C'],
['Orange', 2, 'A'],['Orange', 4, 'C'],
['Orange', 8, 'A'],['Orange', 1, 'C']])
df.columns = ['item', 'score', 'grade']
order = ['Apple', 'Orange']
hue_col = 'grade'
hue_order = ['A','B','C']
width = 0.8
fig, plt.figure(figsize=(6, 3), dpi=150)
ax = sns.boxplot(x='item', y='score', data=df, hue=hue_col, palette=sns.color_palette('husl'), width=width,
order=order, hue_order=hue_order)
ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')
# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(df[hue_col].unique())
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()
medians = df.groupby(['item','grade'])['score'].median()
medians = medians.reindex(pd.MultiIndex.from_product([order,hue_order]))
for x0,(_,med0) in enumerate(medians.groupby(level=0)):
for off,(_,med1) in zip(offsets,med0.groupby(level=1)):
if not np.isnan(med1.item()):
ax.text(x0+off, med1.item(), '{:.0f}'.format(med1.item()),
horizontalalignment='center', va='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))
Upvotes: 2