how to remove outliers based on standard dev, using tidyverse?

Question

I tried out this code using tidyverse package to filter outliers based on sd.

rt_trimmed_data_Dec = data_Dec %>%
 group_by(Time_of_Testing, Item_Type, Group) %>%
 summarise(RT_mean = mean(RT, na.rm=TRUE), RT_sd = sd(RT, na.rm=TRUE))%>%
 ungroup()  %>%
 mutate(rt_high = RT_mean + (2.5 * RT_sd)) %>%
  mutate(rt_low = RT_mean - (2.5 * RT_sd))

Then, I tried to join the two data frames, to apply the filtering out.

data_Dec_RT = data_Dec %>%
   inner_join(rt_trimmed_data_Dec) %>%
   filter(RT < rt_high) %>%
    filter(RT > rt_low)

But then I got this error

Error: `by` required, because the data sources have no common variables
Call rlang::last_error() to see a backtrace. > rlang::last_error() message: by required, because the data sources have no common variables class: rlang_error backtrace: 1. dplyr::inner_join(., rt_trimmed_data_Dec) 9. dplyr:::common_by.NULL(by, x, y) 11. dplyr:::bad_args("by", "required, because the data sources have no common variables") 12. dplyr:::glubort(fmt_args(args), ..., .envir = .envir) 13. dplyr::inner_join(., rt_trimmed_data_Dec).

Could you please advise on how to solve this issue, I would highly appreciate your help.

Ronak Shah · Accepted Answer

I guess you can do this with

library(dplyr)
data_Dec %>%
  group_by(Time_of_Testing, Item_Type, Group) %>%
  filter(between(RT, mean(RT, na.rm=TRUE) - (2.5 * sd(RT, na.rm=TRUE)), 
                     mean(RT, na.rm=TRUE) + (2.5 * sd(RT, na.rm=TRUE))))

how to remove outliers based on standard dev, using tidyverse?

Answers (2)

Related Questions