Diante
Diante

Reputation: 149

Identifying the outliers in a data set in R

So, I have a data set and know how to get the five number summary using the summary command. Now I need to get the instances above the Q3 + 1.5IQR or below the Q1 - 1.5IQR, since these are just numbers - how would I return the instances from a data set which lie above the number or below the number?

Upvotes: 7

Views: 85167

Answers (4)

Sam16
Sam16

Reputation: 1

If you are trying to identify the outliers in your dataset using the 1.5 * IQR standard, there is a simple function that will give you the row number for each case that is an outlier based on your grouping variable (both under Q1 and above Q3). It will also create a Boxplot of your data that will give insight into the distribution of your data.

library(car)

Boxplot(DV ~ IV, data = datafile)

Where:

DV = measured variable
IV = grouping variable

Upvotes: 0

Rali
Rali

Reputation: 31

You can refer to the function remove_outliersin this answer here. It does exactly what you want.

remove_outliers <- function(x, na.rm = TRUE, ...) {
    qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
    H <- 1.5 * IQR(x, na.rm = na.rm)
    y <- x
    y[x < (qnt[1] - H)] <- NA
    y[x > (qnt[2] + H)] <- NA
    y
}

Upvotes: 3

G5W
G5W

Reputation: 37621

You can get this using boxplot. If your variable is x,

OutVals = boxplot(x)$out
which(x %in% OutVals)

If you are annoyed by the plot, you could use

OutVals = boxplot(x, plot=FALSE)$out

Upvotes: 23

Bob Jansen
Bob Jansen

Reputation: 1287

If your dataset is x you can get those numbers using

summary(x)[["1st Qu."]]

and

summary(x)[["3rd Qu."]]

Then you compare against those numbers to get the numbers you want.

Upvotes: 4

Related Questions