Reputation: 75
I tried out this code using tidyverse package to filter outliers based on sd.
rt_trimmed_data_Dec = data_Dec %>%
group_by(Time_of_Testing, Item_Type, Group) %>%
summarise(RT_mean = mean(RT, na.rm=TRUE), RT_sd = sd(RT, na.rm=TRUE))%>%
ungroup() %>%
mutate(rt_high = RT_mean + (2.5 * RT_sd)) %>%
mutate(rt_low = RT_mean - (2.5 * RT_sd))
Then, I tried to join the two data frames, to apply the filtering out.
data_Dec_RT = data_Dec %>%
inner_join(rt_trimmed_data_Dec) %>%
filter(RT < rt_high) %>%
filter(RT > rt_low)
But then I got this error
Error: `by` required, because the data sources have no common variables
Call
rlang::last_error()
to see a backtrace. > rlang::last_error() message:by
required, because the data sources have no common variables class:rlang_error
backtrace: 1. dplyr::inner_join(., rt_trimmed_data_Dec) 9. dplyr:::common_by.NULL(by, x, y) 11. dplyr:::bad_args("by", "required, because the data sources have no common variables") 12. dplyr:::glubort(fmt_args(args), ..., .envir = .envir) 13. dplyr::inner_join(., rt_trimmed_data_Dec).
Could you please advise on how to solve this issue, I would highly appreciate your help.
Upvotes: 1
Views: 4799
Reputation: 85
This is pretty easy to do by z scoring your RT column using scale.
library(tidyverse)
samples = 50
Ps = 10
# data frame that contains participant numbers, and RT scores
data <- data.frame(participant = as.factor(rep(1:Ps, each = samples)),
RT = rnorm(n = samples*Ps, mean = 600, sd = 50))
data_noOutliers <- data %>%
group_by(participant) %>%
mutate(zRT = scale(RT)) %>%
filter(between(zRT,-2.5,+2.5))
Upvotes: 3
Reputation: 389135
I guess you can do this with
library(dplyr)
data_Dec %>%
group_by(Time_of_Testing, Item_Type, Group) %>%
filter(between(RT, mean(RT, na.rm=TRUE) - (2.5 * sd(RT, na.rm=TRUE)),
mean(RT, na.rm=TRUE) + (2.5 * sd(RT, na.rm=TRUE))))
Upvotes: 1