buhtz
buhtz

Reputation: 12200

Seaborn Boxplot with jittered outliers

I want a Boxplot with jittered outliers. But only the outliers not the non-outliers. Searching the web you often find a workaround combining sns.boxplot() and sns.swarmplot().

enter image description here

The problem with that figure is that the outliers are drawn twice. I don't need the red ones I only need the jittered (green) ones.

Also the none-outliers are drawn. I don't need them also.

I also have a feautre request at upstream open about it. But on my current research there is no Seaborn-inbuild solution for that.

This is an MWE reproducing the boxplot shown.

#!/usr/bin/env python3
import random
import pandas
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme()

random.seed(0)

df = pandas.DataFrame({
    'Vals': random.choices(range(200), k=200)})
df_outliers = pandas.DataFrame({
    'Vals': random.choices(range(400, 700), k=20)})

df = pandas.concat([df, df_outliers], axis=0)

flierprops = {
    'marker': 'o',
    'markeredgecolor': 'red',
    'markerfacecolor': 'none'
}

# Usual boxplot
ax = sns.boxplot(y='Vals', data=df, flierprops=flierprops)

# Add jitter with the swarmplot function
ax = sns.swarmplot(y='Vals', data=df, linewidth=.75, color='none', edgecolor='green')
plt.show()

Upvotes: 1

Views: 1154

Answers (1)

JohanC
JohanC

Reputation: 80509

Here is an approach to have jittered outliers. The jitter is similar to sns.stripplot(), not to sns.swarmplot() which uses a rather elaborate spreading algorithm. Basically, all the "line" objects of the subplot are checked whether they have a marker. The x-positions of the "lines" with a marker are moved a bit to create jitter. You might want to vary the amount of jitter, e.g. when you are working with hue.

import random
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme()

random.seed(0)

df = pd.DataFrame({
     'Vals': random.choices(range(200), k=200)})
df_outliers = pd.DataFrame({
     'Vals': random.choices(range(400, 700), k=20)})

df = pd.concat([df, df_outliers], axis=0)

flierprops = {
     'marker': 'o',
     'markeredgecolor': 'red',
     'markerfacecolor': 'none'
}

# Usual boxplot
ax = sns.boxplot(y='Vals', data=df, flierprops=flierprops)

for l in ax.lines:
     if l.get_marker() != '':
          xs = l.get_xdata()
          xs += np.random.uniform(-0.2, 0.2, len(xs))
          l.set_xdata(xs)

plt.tight_layout()
plt.show()

sns.boxplot with jittered outliers

An alternative approach could be to filter out the outliers, and then call sns.swarmplot() or sns.stripplot() only with those points. As seaborn doesn't return the values calculated to position the whiskers, you might need to calculate those again via scipy, taking into account seaborn's filtering on x and on hue.

Upvotes: 3

Related Questions