jlt199
jlt199

Reputation: 2449

Python Seaborn - How are outliers determined in boxplots

I would like to know what algorithm is used to determine the 'outliers' in a boxplot distribution in Seaborn.

On their website seaborn.boxplot they simple state:

The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.

I would really like to know what method they use. I've created boxplots from a dataframe and I seem to have a lot of 'outliers'.

boxplots of my dataframe Thanks

Upvotes: 16

Views: 21271

Answers (3)

Gabriela Trindade
Gabriela Trindade

Reputation: 151

You can calculate it this way:

Q1 = df.quartile(0.25)
Q3 = df.quartile(0.75)
    
IQR = Q3 - Q1

It's an outlier if it is less than:

Q1 - 1.5 * IQR

or if it is greater than:

Q3 + 1.5 * IQR

Upvotes: 9

mwaskom
mwaskom

Reputation: 48992

If you read further on the page you linked (or ctrl-f for "outlier"), you will see:

whis : float, optional
    Proportion of the IQR past the low and high quartiles to extend the plot whiskers.
    Points outside this range will be identified as outliers.

Upvotes: 8

Alex
Alex

Reputation: 12913

It appears, by testing, that seaborn uses whis=1.5 as the default.

whis is defined as the

Proportion of the IQR past the low and high quartiles to extend the plot whiskers.

For a normal distribution, the interquartile range contains 50% of the population and 1.5 * IQR contains about 99%.

Upvotes: 12

Related Questions