Reputation: 31
My question is about replacing all the "True" in the column with average. I have identified the outliers as followed:
high <- mean(df$variable1) + sd(df$variable1) * 3
low <- mean(df$variable1) - sd(df$variable1) * 3
df$Outlier <- (df$variable1 < low | df$variable1 > high)
So the result is a column with some "True" and "False" And I want to replace all the "True" with the average of the rest of the data points.
What should I do :)?
Upvotes: 1
Views: 3752
Reputation: 611
To compute the mean without outlier:
avg = mean(df$Variable1[!df$outlier])
and then replace only outliers:
df$Variable1[df$outlier] = avg
Or, in one line:
df$Variable1[df$outlier] = mean(df$Variable1[-df$outlier])
(although replacing outliers by the average value really sounds like a thing to not do for me)
Upvotes: 1