ChaoS Adm
ChaoS Adm

Reputation: 899

Returning dataframe from function is not working?

I am trying to create the following function. However, when I assign the function to the original dataframe, it becomes empty.

def remove_outliers(feature, df):
    q1 = np.percentile(df[feature], 25) 
    q2 = np.percentile(df[feature], 50) 
    q3 = np.percentile(df[feature], 75) 

    iqr = q3-q1    
    lower_whisker = df[df[feature] <= q1-1.5*iqr][feature].max()
    upper_whisker = df[df[feature] <= q3+1.5*iqr][feature].max()

    return  df[(df[feature] < upper_whisker) & (df[feature]>lower_whisker)] 

I am assigning as follows:

train = remove_outliers('Power',train)

Upvotes: 1

Views: 1050

Answers (1)

Cedric Zoppolo
Cedric Zoppolo

Reputation: 4743

The problem you are facing is that either variable lower_whisker and/or upper_whisker are set to NaN hence the result from the function is an empty DataFrame. You can resolve this just checking for those results and then return the needed.

Below you can see a possible way to rewrite the function to resolve this:

def remove_outliers(feature, df):
    q1 = np.percentile(df[feature], 25)
    q2 = np.percentile(df[feature], 50)
    q3 = np.percentile(df[feature], 75)

    iqr = q3-q1
    lower_whisker = df[df[feature] <= q1-1.5*iqr][feature].max()
    upper_whisker = df[df[feature] <= q3+1.5*iqr][feature].max()
    if lower_whisker is np.nan:
        return df[(df[feature]>lower_whisker)]
    elif upper_whisker is np.nan:
        return df[(df[feature] < upper_whisker)]
    else:
        return df[(df[feature] < upper_whisker) & (df[feature]>lower_whisker)]

Upvotes: 1

Related Questions