Reputation: 53
According to the seaborn documentation, its boxplot method makes the whiskers 1.5*IQR long. However, as seen in the plot from that documentation, this seems not to be the case. The upper and lower whiskers are not the same. Further it seems not to be 1.5 IQR.
Can someone shed some light on why they are different?
https://seaborn.pydata.org/generated/seaborn.boxplot.html
Upvotes: 3
Views: 2262
Reputation: 339340
In principle the assumption is correct that whiskers on the boxplots should be of equal length if they use a multiple of the interquartile range (IQR).
However there are essentially two cases where this is not true. Unfortunately the english wikipedia version does not tell those reasons, but let me translate the explanation from the german wikipedia:
Whisker
One possible definition, originating from John W. Tukey, is to restrict the length of the whisker to maximally 1.5 times the inter quartile range (1.5*IQR).In this case the whisker does however not end exactly at this value, but rather at the value from the data which still lies inside of this boundary. The length of the whisker is hence determined by the data and not solemnly by the inter quartile range. This is the reason why the whisker does not need to be of the same size on both ends of the box. If there are no values outside of the 1.5*IQR boundary, the length of the whisker is determined by the minimal and maximal value. Otherwise, the values outside of the whiskers are marked separately in the diagram; those values can then be treated as outliers.
A plot from the same wikipedia page might make this more obvious:
In case of the diagram shown in the question the second reason most certainly applies: Namely that the lower whisker ends at the position of the lowest data value.
Upvotes: 5
Reputation: 21
matplotlib allows for individual error bars (I assume that's what you mean by 'whiskers'). Here is the page on matplotlib: https://matplotlib.org/1.2.1/examples/pylab_examples/errorbar_demo.html
You can explicitly define the error bars by using xerr and yerr: "xerr/yerr : scalar or array-like, shape(N,) or shape(2,N), optional
If a scalar number, len(N) array-like object, or a N-element array-like object, errorbars are drawn at +/-value relative to the data. Default is None.
If a sequence of shape 2xN, errorbars are drawn at -row1 and +row2 relative to the data."
...and plug them into their respective positions in matplotlib.axes.Axes.errorbar
Axes.errorbar(x, y, yerr=None, xerr=None, fmt='', ecolor=None, elinewidth=None, capsize=None, barsabove=False, lolims=False, uplims=False, xlolims=False, xuplims=False, errorevery=1, capthick=None, *, data=None, **kwargs)
page: https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.errorbar.html
If you are interested in making the error bars different in the +y and -y directions, then you can plot twice on the same figure where the second plot has no markers except for the error bars, and the center of those error bars is the mean between the +y and -y values.
Upvotes: 0