Reputation: 49
How to create a boxplot like this one using the bokeh library in python?
df = sns.load_dataset("titanic")
sns.boxplot(x=df["age"])
Upvotes: 0
Views: 2044
Reputation: 6337
Here is a solution using some random data as input:
import numpy as np
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
output_notebook()
series = pd.Series(list(np.random.randint(0,60,100))+[101]) # one outlier added by hand
Here is the math the boxplot is based on, some quantiles are calculated and the inter quantile range as well as the mean.
qmin, q1, q2, q3, qmax = series.quantile([0, 0.25, 0.5, 0.75, 1])
iqr = q3 - q1
upper = q3 + 1.5 * iqr
lower = q1 - 1.5 * iqr
mean = series.mean()
out = series[(series > upper) | (series < lower)]
if not out.empty:
outlier = list(out.values)
This stays the same for both solutions.
k = 'age'
p = figure(
tools="save",
x_range= [k], # enable categorical axes
title="Boxplot",
plot_width=400,
plot_height=500,
)
upper = min(qmax, upper)
lower = max(qmin, lower)
hbar_height = (qmax - qmin) / 500
# stems
p.segment([k], upper, [k], q3, line_color="black")
p.segment([k], lower, [k], q1, line_color="black")
# boxes
p.vbar([k], 0.7, q2, q3, line_color="black")
p.vbar([k], 0.7, q1, q2, line_color="black")
# whiskers (almost-0 height rects simpler than segments)
p.rect([k], lower, 0.2, hbar_height, line_color="black")
p.rect([k], upper, 0.2, hbar_height, line_color="black")
if not out.empty:
p.circle([k] * len(outlier), outlier, size=6, fill_alpha=0.6)
show(p)
To create a horizontal boxplot hbar
is used instead of vbar
and the order is changes in the segement
s and in the rect
s.
k = 'age'
p = figure(
tools="save",
y_range= [k],
title="Boxplot",
plot_width=400,
plot_height=500,
)
upper = min(qmax, upper)
lower = max(qmin, lower)
hbar_height = (qmax - qmin) / 500
# stems
p.segment(upper, [k], q3, [k], line_color="black")
p.segment(lower, [k], q1, [k], line_color="black")
# boxes
p.hbar([k], 0.7, q2, q3, line_color="black")
p.hbar([k], 0.7, q1, q2, line_color="black")
# whiskers (almost-0 height rects simpler than segments)
p.rect(lower, [k], 0.2, hbar_height, line_color="black")
p.rect(upper, [k], 0.2, hbar_height, line_color="black")
if not out.empty:
p.circle(outlier, [k] * len(outlier), size=6, fill_alpha=0.6)
show(p)
Upvotes: 2