Reputation: 86
I tried to exclude a few outliers from a pandas dataframe, but the function just return the same table without any difference.I can't figure out why.
def exclude_outliers(DataFrame, col_name):
interval = 2.5*DataFrame[col_name].std()
mean = DataFrame[col_name].mean()
m_i = mean + interval
DataFrame = DataFrame[DataFrame[col_name] < m_i]
outlier_column = ['util_linhas_inseguras', 'idade', 'vezes_passou_de_30_59_dias', 'razao_debito', 'salario_mensal', 'numero_linhas_crdto_aberto',
'numero_vezes_passou_90_dias', 'numero_emprestimos_imobiliarios', 'numero_de_vezes_que_passou_60_89_dias', 'numero_de_dependentes']
for col in outlier_column:
exclude_outliers(df_train, col)
df_train.describe()
Upvotes: 0
Views: 85
Reputation: 383
As written, your function doesn't return anything and, as a result, your for loop is not making any changes to the DataFrame. Try the following:
At the end of your function, add the following line:
def exclude_outliers(DataFrame, col_name):
... # Function filters the DataFrame
# Add this line to return the filtered DataFrame
return DataFrame
And then modify your for
loop to update the df_train
:
for col in outlier_column:
# Now we update the DataFrame on each iteration
df_train = exclude_outliers(df_train, col)
Upvotes: 1