How to identify and remove outliers in a data.frame using R?

Question

I have a dataframe that has multiple outliers. I suspect that these ouliers have produced different results than expected.

I tried to use this tip but it didn't work as I still have very different values: https://www.r-bloggers.com/2020/01/how-to-remove-outliers-in-r/

I tried the solution with the rstatix package, but I can't remove the outliers from my data.frame

library(rstatix)
library(dplyr)

df <- data.frame(
  sample = 1:20,
  score = c(rnorm(19, mean = 5, sd = 2), 50))

View(df)

out_df<-identify_outliers(df$score)#identify outliers

df2<-df#copy df

df2<- df2[-which(df2$score %in% out_df),]#remove outliers from df2

View(df2)

akrun · Accepted Answer

The identify_outliers expect a data.frame as input i.e. usage is

identify_outliers(data, ..., variable = NULL)

where

... - One unquoted expressions (or variable name). Used to select a variable of interest. Alternative to the argument variable.

df2 <- subset(df, !score %in% identify_outliers(df, "score")$score)

How to identify and remove outliers in a data.frame using R?

Answers (2)

Related Questions