Reputation: 1285
This seems like a trivial question, but I've been searching for a while and can't seem to find an answer. It also seems like something that should be a standard part of these packages. Does anyone know if there is a standard way to include statistical annotation between distribution plots in seaborn?
For example, between two box or swarmplots?
Upvotes: 56
Views: 68940
Reputation: 61
You could use the package starbars
. You give the pairs and their p-value, and it draws it for you:
import seaborn as sns
import matplotlib.pyplot as plt
import starbars
# taking from the previous example
tips = sns.load_dataset("tips")
df = tips.pivot(columns='day', values='total_bill')
data = [df[c].dropna().tolist() for c in df.columns]
sns.boxplot(x="day", y="total_bill", data=tips)
# adding statistical annotation
annotations = [("Sat", "Sun", 0.002), ("Fri", "Thur", 0.05)]
starbars.draw_annotation(annotations)
plt.show()
It also has an option not to show non-significant p-value bars:
starbars.draw_annotation(annotations, ns_show=False)
You can find the starbars
documentation here.
Disclaimer: I'm the author of the package.
Upvotes: 4
Reputation: 11301
A brace / bracket can be plotted direct with matplotlib.pyplot.plot
or matplotlib.axes.Axes.plot
, and annotations can be added with matplotlib.pyplot.text
or matplotlib.axes.Axes.text
.
seaborn
categorical plots are 0 indexed, whereas box plots, by default, with matplotlib
and pandas
, start at range(1, N+1)
, which can be adjusted with the positions
parameter.
seaborn
is a high-level API for matplotlib
, and pandas.DataFrame.plot
uses matplotlib
as the default backend.
import seaborn as sns
import matplotlib.pyplot as plt
# dataframe in long form for seaborn
tips = sns.load_dataset("tips")
# dataframe in wide form for plotting with pandas.DataFrame.plot
df = tips.pivot(columns='day', values='total_bill')
# data as a list of lists for plotting directly with matplotlib (no nan values allowed)
data = [df[c].dropna().tolist() for c in df.columns]
seaborn
sns.boxplot(x="day", y="total_bill", data=tips, palette="PRGn")
# statistical annotation
x1, x2 = 2, 3 # columns 'Sat' and 'Sun' (first column: 0, see plt.xticks())
y, h, col = tips['total_bill'].max() + 2, 2, 'k'
plt.plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c=col)
plt.text((x1+x2)*.5, y+h, "ns", ha='center', va='bottom', color=col)
plt.show()
pandas.DataFrame.plot
ax = df.plot(kind='box', positions=range(len(df.columns)))
x1, x2 = 2, 3
y, h, col = df.max().max() + 2, 2, 'k'
ax.plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c=col)
ax.text((x1+x2)*.5, y+h, "ns", ha='center', va='bottom', color=col)
matplotlib
plt.boxplot(data, positions=range(len(data)))
x1, x2 = 2, 3
y, h, col = max(map(max, data)) + 2, 2, 'k'
plt.plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c=col)
plt.text((x1+x2)*.5, y+h, "ns", ha='center', va='bottom', color=col)
tips.head()
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
df.head()
day Thur Fri Sat Sun
0 NaN NaN NaN 16.99
1 NaN NaN NaN 10.34
2 NaN NaN NaN 21.01
3 NaN NaN NaN 23.68
4 NaN NaN NaN 24.59
data
[[27.2, 22.76, 17.29, ..., 20.53, 16.47, 18.78],
[28.97, 22.49, 5.75, ..., 13.42, 16.27, 10.09],
[20.65, 17.92, 20.29, ..., 29.03, 27.18, 22.67, 17.82],
[16.99, 10.34, 21.01, ..., 18.15, 23.1, 15.69]]
Upvotes: 81
Reputation: 1044
One may also be interested in adding several annotations to different pairs of boxes. In such a case, it might be useful to handle the placement of the different lines and texts in the y-axis automatically. I and other contributors wrote a small function to handle these cases (see Github repo), which correctly stacks the lines one on top of each other without overlapping. Annotations can be either inside or outside the plot, and several statistical tests are implemented: Mann-Whitney and t-test (independent and paired). Here is one minimal example.
import matplotlib.pyplot as plt
import seaborn as sns
from statannot import add_stat_annotation
sns.set(style="whitegrid")
df = sns.load_dataset("tips")
x = "day"
y = "total_bill"
order = ['Sun', 'Thur', 'Fri', 'Sat']
ax = sns.boxplot(data=df, x=x, y=y, order=order)
add_stat_annotation(ax, data=df, x=x, y=y, order=order,
box_pairs=[("Thur", "Fri"), ("Thur", "Sat"), ("Fri", "Sun")],
test='Mann-Whitney', text_format='star', loc='outside', verbose=2)
x = "day"
y = "total_bill"
hue = "smoker"
ax = sns.boxplot(data=df, x=x, y=y, hue=hue)
add_stat_annotation(ax, data=df, x=x, y=y, hue=hue,
box_pairs=[(("Thur", "No"), ("Fri", "No")),
(("Sat", "Yes"), ("Sat", "No")),
(("Sun", "No"), ("Thur", "Yes"))
],
test='t-test_ind', text_format='full', loc='inside', verbose=2)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
Upvotes: 73