How to exclude data points more than 2 SD from sample mean in R

Question

I have a dataframe in R as below and want to exclude datapoints that are more than 2 SDs from the sample mean. I need to do this by condition. So, specifically, I need to group by Condition, and then exclude datapoints more than 2 SDs from the mean of medErr. Any tips on how to do this (I use tidy verse but am a bit stuck)

Thanks!

Nr	ID	Sex	Age	Condition	meanErr	medErr	varErr
1	21343	female	19	Causal	1.589679618	1.545205213	0.93076650
2	21343	female	19	Non-Causal	1.002208099	1.009241219	0.45208960
3	21363	female	20	Causal	3.138516587	2.630161424	5.74271903
4	21363	female	20	Non-Causal	1.512882702	1.245398206	1.24308910
5	21368	female	20	Causal	-0.425156892	-0.382225350	0.04519723
6	21368	female	20	Non-Causal	0.431359690	0.433967936	0.14884018

DaveArmstrong · Accepted Answer

How about something like this:

dat %>% 
  group_by(Condition) %>% 
  mutate(out = case_when(
    medErr > mean(medErr, na.rm=TRUE) + 2*sd(medErr, na.rm=TRUE) ~ 1, 
    medErr < mean(medErr, na.rm=TRUE) - 2*sd(medErr, na.rm=TRUE) ~ 1, 
    TRUE ~ 0)) %>% 
  filter(out == 0)

How to exclude data points more than 2 SD from sample mean in R

Answers (1)

Related Questions