random student
random student

Reputation: 775

Inaccurate outliers values does not match with outlier in box plot

This is my first time trying to detect outliers, i use box plot to detect it. Somehow the output of the code shows the lower bound (minimum value) and the upper bound (maximum value) return weird values in my opinion because it somehow makes every data is an outlier. Meanwhile the box plot shows the correct visualization of the outliers logically. What did i do wrong and how to solve this?

This is the box plot

import pandas as pd
import numpy as np
import seaborn as sns

cols = pd.DataFrame({'numbers':[100,300,200,400,500,6000,800,200,200]})

sns.boxplot(x = cols.numbers)

def outlierHandling(numbers):
    numbers = sorted(numbers)
    Q1 , Q3 = np.percentile(numbers, [25,75] , interpolation='nearest')
    print('Q1,Q3 : ',Q1,Q3)
    IQR = Q3 - Q1
    lowerBound = Q1 - (1.5 * IQR)
    upperBound = Q3 - (1.5 * IQR)
    print('lowerBound,upperBound : ',lowerBound,upperBound)
    return lowerBound,upperBound

lowerbound,upperbound = outlierHandling(cols.numbers)
print('Outlier values : \n',cols[(cols.numbers < lowerbound) | (cols.numbers > upperbound)])

Output

Q1,Q3 :  200 500
lowerBound,upperBound :  -250.0 50.0
Outlier values : 
    numbers
0      100
1      300
2      200
3      400
4      500
5     6000
6      800
7      200
8      200

Upvotes: 1

Views: 199

Answers (1)

quest
quest

Reputation: 3926

Here is the mistake:

    upperBound = Q3 + (1.5 * IQR)

Should be + not -.

Upvotes: 1

Related Questions