Gutemberg Schiessl
Gutemberg Schiessl

Reputation: 86

Python, pandas exclude outliers function

I tried to exclude a few outliers from a pandas dataframe, but the function just return the same table without any difference.I can't figure out why.

excluding outliers

def exclude_outliers(DataFrame, col_name):
    interval = 2.5*DataFrame[col_name].std()
    mean = DataFrame[col_name].mean()
    m_i = mean + interval 
    DataFrame = DataFrame[DataFrame[col_name] < m_i]
 

outlier_column = ['util_linhas_inseguras', 'idade', 'vezes_passou_de_30_59_dias', 'razao_debito', 'salario_mensal', 'numero_linhas_crdto_aberto',
                  'numero_vezes_passou_90_dias', 'numero_emprestimos_imobiliarios', 'numero_de_vezes_que_passou_60_89_dias', 'numero_de_dependentes']

for col in outlier_column:
    exclude_outliers(df_train, col)

df_train.describe()

Upvotes: 0

Views: 85

Answers (1)

benwshul
benwshul

Reputation: 383

As written, your function doesn't return anything and, as a result, your for loop is not making any changes to the DataFrame. Try the following:

At the end of your function, add the following line:

def exclude_outliers(DataFrame, col_name):
   ...  # Function filters the DataFrame
   # Add this line to return the filtered DataFrame
   return DataFrame

And then modify your for loop to update the df_train:

for col in outlier_column:
    # Now we update the DataFrame on each iteration
    df_train = exclude_outliers(df_train, col)

Upvotes: 1

Related Questions