Reputation: 2449
I would like to know what algorithm is used to determine the 'outliers' in a boxplot distribution in Seaborn.
On their website seaborn.boxplot they simple state:
The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.
I would really like to know what method they use. I've created boxplots from a dataframe and I seem to have a lot of 'outliers'.
Upvotes: 16
Views: 21271
Reputation: 151
You can calculate it this way:
Q1 = df.quartile(0.25)
Q3 = df.quartile(0.75)
IQR = Q3 - Q1
It's an outlier if it is less than:
Q1 - 1.5 * IQR
or if it is greater than:
Q3 + 1.5 * IQR
Upvotes: 9
Reputation: 48992
If you read further on the page you linked (or ctrl-f for "outlier"), you will see:
whis : float, optional
Proportion of the IQR past the low and high quartiles to extend the plot whiskers.
Points outside this range will be identified as outliers.
Upvotes: 8
Reputation: 12913
It appears, by testing, that seaborn uses whis=1.5
as the default.
whis
is defined as the
Proportion of the IQR past the low and high quartiles to extend the plot whiskers.
For a normal distribution, the interquartile range contains 50% of the population and 1.5 * IQR contains about 99%.
Upvotes: 12