Shirley zou
Shirley zou

Reputation: 31

How to replace outliers with average

My question is about replacing all the "True" in the column with average. I have identified the outliers as followed:

high <- mean(df$variable1) + sd(df$variable1) * 3
low <- mean(df$variable1) - sd(df$variable1) * 3
df$Outlier <- (df$variable1 < low | df$variable1 > high)

So the result is a column with some "True" and "False" And I want to replace all the "True" with the average of the rest of the data points.

What should I do :)?

Upvotes: 1

Views: 3752

Answers (1)

glagla
glagla

Reputation: 611

To compute the mean without outlier:

avg = mean(df$Variable1[!df$outlier])

and then replace only outliers:

df$Variable1[df$outlier] = avg

Or, in one line:

df$Variable1[df$outlier] = mean(df$Variable1[-df$outlier])

(although replacing outliers by the average value really sounds like a thing to not do for me)

Upvotes: 1

Related Questions