Reputation: 17
I have a dataset (iris
for example) and I want to keep inside each column only the statistical outliers that are deviating for mean
+/- 3.5*sd
or more and transform all the other values into NA's.
What is the proper way of doing this in modern dplyr?
Upvotes: 0
Views: 52
Reputation: 887531
We can use between
with case_when
library(dplyr)
iris1 <- iris %>%
mutate(across(where(is.numeric), ~
case_when(between(., mean(.) - 3.5 * sd(.), mean(.) + 3.5 * sd(.)) ~ .)
))
Upvotes: 1