lim-lim
lim-lim

Reputation: 17

R: how to keep only sd outliers in the dataset

I have a dataset (iris for example) and I want to keep inside each column only the statistical outliers that are deviating for mean +/- 3.5*sd or more and transform all the other values into NA's. What is the proper way of doing this in modern dplyr?

Upvotes: 0

Views: 52

Answers (1)

akrun
akrun

Reputation: 887531

We can use between with case_when

library(dplyr)
iris1 <- iris %>% 
  mutate(across(where(is.numeric), ~ 
      case_when(between(., mean(.) - 3.5 * sd(.), mean(.) + 3.5 * sd(.)) ~ .)
            ))

Upvotes: 1

Related Questions