Reputation: 899
I am trying to create the following function. However, when I assign the function to the original dataframe, it becomes empty.
def remove_outliers(feature, df):
q1 = np.percentile(df[feature], 25)
q2 = np.percentile(df[feature], 50)
q3 = np.percentile(df[feature], 75)
iqr = q3-q1
lower_whisker = df[df[feature] <= q1-1.5*iqr][feature].max()
upper_whisker = df[df[feature] <= q3+1.5*iqr][feature].max()
return df[(df[feature] < upper_whisker) & (df[feature]>lower_whisker)]
I am assigning as follows:
train = remove_outliers('Power',train)
Upvotes: 1
Views: 1050
Reputation: 4743
The problem you are facing is that either variable lower_whisker
and/or upper_whisker
are set to NaN
hence the result from the function is an empty DataFrame. You can resolve this just checking for those results and then return the needed.
Below you can see a possible way to rewrite the function to resolve this:
def remove_outliers(feature, df):
q1 = np.percentile(df[feature], 25)
q2 = np.percentile(df[feature], 50)
q3 = np.percentile(df[feature], 75)
iqr = q3-q1
lower_whisker = df[df[feature] <= q1-1.5*iqr][feature].max()
upper_whisker = df[df[feature] <= q3+1.5*iqr][feature].max()
if lower_whisker is np.nan:
return df[(df[feature]>lower_whisker)]
elif upper_whisker is np.nan:
return df[(df[feature] < upper_whisker)]
else:
return df[(df[feature] < upper_whisker) & (df[feature]>lower_whisker)]
Upvotes: 1