Interquartile Rules to Replace Outliers in Python

Question

i'm facing an issue when replacing outliers with upper and lower boundary with Interquartile Rules, the kernel return an error saying "Must specify axis=0 or 1"

The code of defining the function of interquartile rules to replace outliers with upper and lower boundary as follow:

def iqr(df):
    for col in df.columns:
        if df[col].dtype != object:
            Q1 = df[col].quantile(0.25)
            Q3 = df.quantile(0.75)
            IQR = Q3 - Q1
            S = 1.5*IQR
            LB = Q1 - S
            UB = Q3 + S
            df[df > UB] = UB
            ddf[df < LB] = LB
        else:
            break
    return df

The dataframe is boston, which can be loaded from scikit learn

from sklearn.datasets import load_boston
df = pd.DataFrame(load_boston().data)
df.columns = boston.feature_names
df

Then, i use the function to replace the numerical attributes outliers with upper or lower boundary

iqr(df)

But then it turns out with the value error

ValueError: Must specify axis=0 or 1

Looking for help, thank you!

StupidWolf · Accepted Answer

Within the iteration through columns, you should always use df[col], and not df since you are working with only one column. so for example in your code:

Q3 = df.quantile(0.75)

should be

Q3 = df[col].quantile(0.75)

And

df[df > UB] = UB

should be

df.loc[df > UB,col] = UB

And so on ......

Without changing your function too much, this works:

def iqr(df):
    for col in df.columns:
        if df[col].dtype != object:
            Q1 = df[col].quantile(0.25)
            Q3 = df[col].quantile(0.75)
            IQR = Q3 - Q1
            S = 1.5*IQR
            LB = Q1 - S
            UB = Q3 + S
            df.loc[df[col] > UB,col] = UB
            df.loc[df[col] < LB,col] = LB
        else:
            break
    return df

Consider writing the function for just one column, and use apply :

def iqr(x):
    IQR = np.diff(x.quantile([0.25,0.75]))[0]
    S = 1.5*IQR
    x[x < Q1 - S] = Q1 - S
    x[x > Q3 + S] = Q1 + S
    return x

df.select_dtypes('number') = df.select_dtypes('number').apply(iqr)

Interquartile Rules to Replace Outliers in Python

Answers (2)

Related Questions