Ci Leong
Ci Leong

Reputation: 92

How to find the outlier (40, 10) in this case using IQR rule?

Suppose I need to remove the outlier, that is (40, 10) in this case (refer to the plot attached below) using IQR rule, how do I do that?

Compared to the neighbouring points, (40, 10) is definitely an outlier. However,
Q1 = 11.25,
Q3 = 35.75
1.5 * IQR = 1.5 * (Q3 - Q1) = 36.75
Only points with y-val lower than 11.25-36.75 or greater than 35.75+36.75 are considered outliers.
How do I find and remove (40, 10) using IQR rule if I must use IQR rule?

Here's my code:

import pandas as pd
import matplotlib.pyplot as plt

test = pd.DataFrame({'x': range(50), 'y': [i if i != 40 else 10 for i in range(50)]})

plt.figure(**FIGURE)
plt.scatter(test['x'], test['y'], marker='x')
plt.show()

Here's the plot generated from the above code.

plot

Upvotes: 0

Views: 138

Answers (1)

Galo Castillo
Galo Castillo

Reputation: 352

The way you are using the IQR is only considering the X axis component. If you do not include the Y axis components, then the point at (40, 10) is not an outlier.

You should use a method that considers 2D instances, such as Local Outlier Factor or any other.

Upvotes: 0

Related Questions