Insan Cahya
Insan Cahya

Reputation: 71

Define Function to Remove Outliers

I created a function to remove outliers data like this:

def remove_outliers(data):
    numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
    data = data.select_dtypes(include=numerics)

    for i in data.columns:
        Q1 = data[i].quantile(0.25)
        Q3 = data[i].quantile(0.75)
        IQR = Q3 - Q1
    
        data = data[~((data[i] < (Q1 - 1.5 * IQR)) | (data[i] > (Q3 + 1.5 * IQR)))]

But when I check using the boxplot the outliers are still not deleted. What's wrong with the code?

Upvotes: 0

Views: 682

Answers (1)

Helen Craven
Helen Craven

Reputation: 11

You need to return the dataset within the function itself. For example:

def remove_outliers(data):
    numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
    data = data.select_dtypes(include=numerics)

    for i in data.columns:
        Q1 = data[i].quantile(0.25)
        Q3 = data[i].quantile(0.75)
        IQR = Q3 - Q1
    
        data = data[~((data[i] < (Q1 - 1.5 * IQR)) | (data[i] > (Q3 + 1.5 * IQR)))]

    return data

You haven't provided code to how you are applying this function to get a box-plot, but I hope this helps!

Upvotes: 1

Related Questions