Reputation: 71
I created a function to remove outliers data like this:
def remove_outliers(data):
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
data = data.select_dtypes(include=numerics)
for i in data.columns:
Q1 = data[i].quantile(0.25)
Q3 = data[i].quantile(0.75)
IQR = Q3 - Q1
data = data[~((data[i] < (Q1 - 1.5 * IQR)) | (data[i] > (Q3 + 1.5 * IQR)))]
But when I check using the boxplot the outliers are still not deleted. What's wrong with the code?
Upvotes: 0
Views: 682
Reputation: 11
You need to return the dataset within the function itself. For example:
def remove_outliers(data):
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
data = data.select_dtypes(include=numerics)
for i in data.columns:
Q1 = data[i].quantile(0.25)
Q3 = data[i].quantile(0.75)
IQR = Q3 - Q1
data = data[~((data[i] < (Q1 - 1.5 * IQR)) | (data[i] > (Q3 + 1.5 * IQR)))]
return data
You haven't provided code to how you are applying this function to get a box-plot, but I hope this helps!
Upvotes: 1