K.W. LEE
K.W. LEE

Reputation: 63

Interquartile Rules to Replace Outliers in Python

i'm facing an issue when replacing outliers with upper and lower boundary with Interquartile Rules, the kernel return an error saying "Must specify axis=0 or 1"

The code of defining the function of interquartile rules to replace outliers with upper and lower boundary as follow:

def iqr(df):
    for col in df.columns:
        if df[col].dtype != object:
            Q1 = df[col].quantile(0.25)
            Q3 = df.quantile(0.75)
            IQR = Q3 - Q1
            S = 1.5*IQR
            LB = Q1 - S
            UB = Q3 + S
            df[df > UB] = UB
            ddf[df < LB] = LB
        else:
            break
    return df

The dataframe is boston, which can be loaded from scikit learn

from sklearn.datasets import load_boston
df = pd.DataFrame(load_boston().data)
df.columns = boston.feature_names
df

Then, i use the function to replace the numerical attributes outliers with upper or lower boundary

iqr(df)

But then it turns out with the value error

ValueError: Must specify axis=0 or 1

Looking for help, thank you!

Upvotes: 1

Views: 1946

Answers (2)

StupidWolf
StupidWolf

Reputation: 46968

Within the iteration through columns, you should always use df[col], and not df since you are working with only one column. so for example in your code:

Q3 = df.quantile(0.75)

should be

Q3 = df[col].quantile(0.75)

And

df[df > UB] = UB

should be

df.loc[df > UB,col] = UB

And so on ......

Without changing your function too much, this works:

def iqr(df):
    for col in df.columns:
        if df[col].dtype != object:
            Q1 = df[col].quantile(0.25)
            Q3 = df[col].quantile(0.75)
            IQR = Q3 - Q1
            S = 1.5*IQR
            LB = Q1 - S
            UB = Q3 + S
            df.loc[df[col] > UB,col] = UB
            df.loc[df[col] < LB,col] = LB
        else:
            break
    return df

Consider writing the function for just one column, and use apply :

def iqr(x):
    IQR = np.diff(x.quantile([0.25,0.75]))[0]
    S = 1.5*IQR
    x[x < Q1 - S] = Q1 - S
    x[x > Q3 + S] = Q1 + S
    return x

df.select_dtypes('number') = df.select_dtypes('number').apply(iqr)

Upvotes: 1

TMBailey
TMBailey

Reputation: 667

To help debug this code, after you load in df you could set col and then run individual lines of code from inside your iqr function.

import pandas as pd

# Make some toy data.  Could also load boston dataset.
df = pd.DataFrame(dict(a=[-10, 100], b=[-100, 25]))
df

# Get the name of the first data column.
col = df.columns[0]
col

# Check if Q1 calculation works.
Q1 = df[col].quantile(0.25)
Q1

...

Upvotes: 0

Related Questions