Reputation: 11
I have numeric data set with few columns and hundreds of rows, looks similar to this:
a | b | c | d
1 | 3 | .3 | 26
.02 | 32 | 5 | 2.6
I am trying to detect outliers using std
, I found this code:
df.a[((df.a - df.a.mean()) / df.a.std()).abs() > 2]
Which does what I want for a single column, I would like to be able to do it for whole df
in a loop maybe? Each column has different mean
and std
. It might be something simple but Im quite new to all this. If it is possible to display the outliers in df
as values and in other cells(no outliers) Nan
or 0
?
Many thanks in advance.
Upvotes: 1
Views: 127
Reputation: 13426
Try below code:
for col in df.columns:
df[col] = df[col][((df[col] - df[col].mean()) / df.[col].std()).abs() > 2]
Upvotes: 1