Reputation: 92
Suppose I need to remove the outlier, that is (40, 10) in this case (refer to the plot attached below) using IQR rule, how do I do that?
Compared to the neighbouring points, (40, 10) is definitely an outlier. However,
Q1 = 11.25,
Q3 = 35.75
1.5 * IQR = 1.5 * (Q3 - Q1) = 36.75
Only points with y-val lower than 11.25-36.75 or greater than 35.75+36.75 are considered outliers.
How do I find and remove (40, 10) using IQR rule if I must use IQR rule?
Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
test = pd.DataFrame({'x': range(50), 'y': [i if i != 40 else 10 for i in range(50)]})
plt.figure(**FIGURE)
plt.scatter(test['x'], test['y'], marker='x')
plt.show()
Here's the plot generated from the above code.
Upvotes: 0
Views: 138
Reputation: 352
The way you are using the IQR is only considering the X axis component. If you do not include the Y axis components, then the point at (40, 10) is not an outlier.
You should use a method that considers 2D instances, such as Local Outlier Factor or any other.
Upvotes: 0