Kaja
Kaja

Reputation: 11

Detecting outliers in df

I have numeric data set with few columns and hundreds of rows, looks similar to this:

a   |  b  |  c  |  d
1   |  3  |  .3 |  26
.02 | 32  |  5  |  2.6

I am trying to detect outliers using std, I found this code:

df.a[((df.a - df.a.mean()) / df.a.std()).abs() > 2]

Which does what I want for a single column, I would like to be able to do it for whole df in a loop maybe? Each column has different mean and std. It might be something simple but Im quite new to all this. If it is possible to display the outliers in df as values and in other cells(no outliers) Nan or 0?

Many thanks in advance.

Upvotes: 1

Views: 127

Answers (1)

Sociopath
Sociopath

Reputation: 13426

Try below code:

for col in df.columns:
    df[col] = df[col][((df[col] - df[col].mean()) / df.[col].std()).abs() > 2]

Upvotes: 1

Related Questions