Inaccurate outliers values does not match with outlier in box plot

Question

This is my first time trying to detect outliers, i use box plot to detect it. Somehow the output of the code shows the lower bound (minimum value) and the upper bound (maximum value) return weird values in my opinion because it somehow makes every data is an outlier. Meanwhile the box plot shows the correct visualization of the outliers logically. What did i do wrong and how to solve this?

import pandas as pd
import numpy as np
import seaborn as sns

cols = pd.DataFrame({'numbers':[100,300,200,400,500,6000,800,200,200]})

sns.boxplot(x = cols.numbers)

def outlierHandling(numbers):
    numbers = sorted(numbers)
    Q1 , Q3 = np.percentile(numbers, [25,75] , interpolation='nearest')
    print('Q1,Q3 : ',Q1,Q3)
    IQR = Q3 - Q1
    lowerBound = Q1 - (1.5 * IQR)
    upperBound = Q3 - (1.5 * IQR)
    print('lowerBound,upperBound : ',lowerBound,upperBound)
    return lowerBound,upperBound

lowerbound,upperbound = outlierHandling(cols.numbers)
print('Outlier values : 
',cols[(cols.numbers < lowerbound) | (cols.numbers > upperbound)])

Output

Q1,Q3 :  200 500
lowerBound,upperBound :  -250.0 50.0
Outlier values : 
    numbers
0      100
1      300
2      200
3      400
4      500
5     6000
6      800
7      200
8      200

quest · Accepted Answer

Here is the mistake:

    upperBound = Q3 + (1.5 * IQR)

Should be + not -.

Inaccurate outliers values does not match with outlier in box plot

Answers (1)

Related Questions