Marc Buehner
Marc Buehner

Reputation: 55

How to exclude data points more than 2 SD from sample mean in R

I have a dataframe in R as below and want to exclude datapoints that are more than 2 SDs from the sample mean. I need to do this by condition. So, specifically, I need to group by Condition, and then exclude datapoints more than 2 SDs from the mean of medErr. Any tips on how to do this (I use tidy verse but am a bit stuck)

Thanks!

Nr ID Sex Age Condition meanErr medErr varErr
1 21343 female 19 Causal 1.589679618 1.545205213 0.93076650
2 21343 female 19 Non-Causal 1.002208099 1.009241219 0.45208960
3 21363 female 20 Causal 3.138516587 2.630161424 5.74271903
4 21363 female 20 Non-Causal 1.512882702 1.245398206 1.24308910
5 21368 female 20 Causal -0.425156892 -0.382225350 0.04519723
6 21368 female 20 Non-Causal 0.431359690 0.433967936 0.14884018

Upvotes: 0

Views: 748

Answers (1)

DaveArmstrong
DaveArmstrong

Reputation: 21992

How about something like this:

dat %>% 
  group_by(Condition) %>% 
  mutate(out = case_when(
    medErr > mean(medErr, na.rm=TRUE) + 2*sd(medErr, na.rm=TRUE) ~ 1, 
    medErr < mean(medErr, na.rm=TRUE) - 2*sd(medErr, na.rm=TRUE) ~ 1, 
    TRUE ~ 0)) %>% 
  filter(out == 0)

Upvotes: 2

Related Questions