Reputation: 775
This is my first time trying to detect outliers, i use box plot to detect it. Somehow the output of the code shows the lower bound (minimum value) and the upper bound (maximum value) return weird values in my opinion because it somehow makes every data is an outlier. Meanwhile the box plot shows the correct visualization of the outliers logically. What did i do wrong and how to solve this?
import pandas as pd
import numpy as np
import seaborn as sns
cols = pd.DataFrame({'numbers':[100,300,200,400,500,6000,800,200,200]})
sns.boxplot(x = cols.numbers)
def outlierHandling(numbers):
numbers = sorted(numbers)
Q1 , Q3 = np.percentile(numbers, [25,75] , interpolation='nearest')
print('Q1,Q3 : ',Q1,Q3)
IQR = Q3 - Q1
lowerBound = Q1 - (1.5 * IQR)
upperBound = Q3 - (1.5 * IQR)
print('lowerBound,upperBound : ',lowerBound,upperBound)
return lowerBound,upperBound
lowerbound,upperbound = outlierHandling(cols.numbers)
print('Outlier values : \n',cols[(cols.numbers < lowerbound) | (cols.numbers > upperbound)])
Output
Q1,Q3 : 200 500
lowerBound,upperBound : -250.0 50.0
Outlier values :
numbers
0 100
1 300
2 200
3 400
4 500
5 6000
6 800
7 200
8 200
Upvotes: 1
Views: 199
Reputation: 3926
Here is the mistake:
upperBound = Q3 + (1.5 * IQR)
Should be + not -.
Upvotes: 1