Reputation: 55
I have a dataframe in R as below and want to exclude datapoints that are more than 2 SDs from the sample mean. I need to do this by condition. So, specifically, I need to group by Condition, and then exclude datapoints more than 2 SDs from the mean of medErr. Any tips on how to do this (I use tidy verse but am a bit stuck)
Thanks!
Nr | ID | Sex | Age | Condition | meanErr | medErr | varErr |
---|---|---|---|---|---|---|---|
1 | 21343 | female | 19 | Causal | 1.589679618 | 1.545205213 | 0.93076650 |
2 | 21343 | female | 19 | Non-Causal | 1.002208099 | 1.009241219 | 0.45208960 |
3 | 21363 | female | 20 | Causal | 3.138516587 | 2.630161424 | 5.74271903 |
4 | 21363 | female | 20 | Non-Causal | 1.512882702 | 1.245398206 | 1.24308910 |
5 | 21368 | female | 20 | Causal | -0.425156892 | -0.382225350 | 0.04519723 |
6 | 21368 | female | 20 | Non-Causal | 0.431359690 | 0.433967936 | 0.14884018 |
Upvotes: 0
Views: 748
Reputation: 21992
How about something like this:
dat %>%
group_by(Condition) %>%
mutate(out = case_when(
medErr > mean(medErr, na.rm=TRUE) + 2*sd(medErr, na.rm=TRUE) ~ 1,
medErr < mean(medErr, na.rm=TRUE) - 2*sd(medErr, na.rm=TRUE) ~ 1,
TRUE ~ 0)) %>%
filter(out == 0)
Upvotes: 2