lemon93
lemon93

Reputation: 169

How to put multiple median values in the boxplot?

I only found the code can put median in boxplot and I tried it. But since my boxplot is multiple, so it unable to get the x-tick get locator. How can I find the minor tick locator of the boxplot, I already tried it yet still cannot get the location of multiple boxplot location. Any suggestion to improve this plot?

df = pd.DataFrame([['Apple', 10, 'A'],['Apple', 8, 'B'],['Apple', 10, 'C'],
              ['Apple', 5, 'A'],['Apple', 7, 'B'],['Apple', 9, 'C'],
              ['Apple', 3, 'A'],['Apple', 5, 'B'],['Apple', 4, 'C'],
              ['Orange', 3, 'A'],['Orange', 4, 'B'],['Orange', 6, 'C'],
              ['Orange', 2, 'A'],['Orange', 8, 'B'],['Orange', 4, 'C'],
              ['Orange', 8, 'A'],['Orange', 10, 'B'],['Orange', 1, 'C']])

df.columns = ['item', 'score', 'grade']


fig = plt.figure(figsize=(6, 3), dpi=150)

ax = sns.boxplot(x='item', y='score', data=df, hue='grade', palette=sns.color_palette('husl'))
ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')

medians = df.groupby(['item','grade'])['score'].median().values
median_labels = [str(np.round(s, 2)) for s in medians]

pos = range(len(medians))
for tick,label in zip(pos, ax.get_xticklabels()):
    ax.text(pos[tick], medians[tick], median_labels[tick], 
            horizontalalignment='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))

enter image description here

Upvotes: 1

Views: 673

Answers (1)

Diziet Asahi
Diziet Asahi

Reputation: 40667

Seaborn is notoriously difficult to work with. The code below works but might break if one of the category is empty and no boxplot is drawn for example, use at your own risks:

df = pd.DataFrame([['Apple', 10, 'A'],['Apple', 8, 'B'],['Apple', 10, 'C'],
              ['Apple', 5, 'A'],['Apple', 7, 'B'],['Apple', 9, 'C'],
              ['Apple', 3, 'A'],['Apple', 5, 'B'],['Apple', 4, 'C'],
              ['Orange', 3, 'A'],['Orange', 4, 'B'],['Orange', 6, 'C'],
              ['Orange', 2, 'A'],['Orange', 8, 'B'],['Orange', 4, 'C'],
              ['Orange', 8, 'A'],['Orange', 10, 'B'],['Orange', 1, 'C']])

df.columns = ['item', 'score', 'grade']


width = 0.8
hue_col = 'grade'

fig, plt.figure(figsize=(6, 3), dpi=150)
ax = sns.boxplot(x='item', y='score', data=df, hue=hue_col, palette=sns.color_palette('husl'), width=width)
ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')

# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(df[hue_col].unique())
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()

medians = df.groupby(['item','grade'])['score'].median()

for x0,(_,med0) in enumerate(medians.groupby(level=0)):
    for off,(_,med1) in zip(offsets,med0.groupby(level=1)):
        ax.text(x0+off, med1.item(), '{:.0f}'.format(med1.item()), 
            horizontalalignment='center', va='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))

enter image description here

In general, to avoid any surpises, if you want to modify a seaborn plot, I would recommend you specify order and hue_order so that the plot is drawn in a pre-determined order. Here is an other version that is able to deal with a missing category:

df = pd.DataFrame([['Apple', 8, 'B'],['Apple', 10, 'C'],
              ['Apple', 7, 'B'],['Apple', 9, 'C'],
              ['Apple', 5, 'B'],['Apple', 4, 'C'],
              ['Orange', 3, 'A'],['Orange', 6, 'C'],
              ['Orange', 2, 'A'],['Orange', 4, 'C'],
              ['Orange', 8, 'A'],['Orange', 1, 'C']])

df.columns = ['item', 'score', 'grade']


order = ['Apple', 'Orange']
hue_col = 'grade'
hue_order = ['A','B','C']
width = 0.8

fig, plt.figure(figsize=(6, 3), dpi=150)
ax = sns.boxplot(x='item', y='score', data=df, hue=hue_col, palette=sns.color_palette('husl'), width=width,
                order=order, hue_order=hue_order)
ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')

# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(df[hue_col].unique())
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()

medians = df.groupby(['item','grade'])['score'].median()
medians = medians.reindex(pd.MultiIndex.from_product([order,hue_order]))

for x0,(_,med0) in enumerate(medians.groupby(level=0)):
    for off,(_,med1) in zip(offsets,med0.groupby(level=1)):
        if not np.isnan(med1.item()):
            ax.text(x0+off, med1.item(), '{:.0f}'.format(med1.item()), 
                horizontalalignment='center', va='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))

enter image description here

Upvotes: 2

Related Questions