Removing outlier from a single column

Question

I am removing outliers from a dataset.

I decided to remove outlier from each column one-by-one. I have columns with a different number of missing values.

I used this code but it removed the whole row containg the outlier and due to many NaN values in my data, number of rows of my data reduced drastically.

def remove_outlier(df_in, col_name):
    q1 = df_in[col_name].quantile(0.25)
    q3 = df_in[col_name].quantile(0.75)
    iqr = q3-q1 #Interquartile range
    fence_low  = q1-1.5*iqr
    fence_high = q3+1.5*iqr
    df_out = df_in.loc[(df_in[col_name] > fence_low) & (df_in[col_name] < fence_high)]
    return df_out

Then I decided to remove outlier from each column, and fill ouliers with NaN in each column I wrote this code

def remove_outlier(df_in, col_name, thres=1.5):
    q1 = df_in[col_name].quantile(0.25)
    q3 = df_in[col_name].quantile(0.75)
    iqr = q3-q1 #Interquartile range
    fence_low  = q1-thres*iqr
    fence_high = q3+thres*iqr
    mask = (df_in[col_name] > fence_high) & (df_in[col_name] < fence_low)
    df_in.loc[mask, col_name] = np.nan
    return df_in

But this code doesn't filters the outliers. gave the same result.

What is wrong in this code? How can I correct it?

Is there any other elegant method to filter outlier?

Venkatesh Garnepudi · Accepted Answer

Check the condition once. How can that be &. It should be |

Removing outlier from a single column

Answers (2)

Related Questions