Reputation: 149
So, I have a data set and know how to get the five number summary using the summary command. Now I need to get the instances above the Q3 + 1.5IQR or below the Q1 - 1.5IQR, since these are just numbers - how would I return the instances from a data set which lie above the number or below the number?
Upvotes: 7
Views: 85167
Reputation: 1
If you are trying to identify the outliers in your dataset using the 1.5 * IQR standard, there is a simple function that will give you the row number for each case that is an outlier based on your grouping variable (both under Q1 and above Q3). It will also create a Boxplot of your data that will give insight into the distribution of your data.
library(car)
Boxplot(DV ~ IV, data = datafile)
Where:
DV = measured variable
IV = grouping variable
Upvotes: 0
Reputation: 31
You can refer to the function remove_outliers
in this answer here. It does exactly what you want.
remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
Upvotes: 3
Reputation: 37621
You can get this using boxplot
. If your variable is x,
OutVals = boxplot(x)$out
which(x %in% OutVals)
If you are annoyed by the plot, you could use
OutVals = boxplot(x, plot=FALSE)$out
Upvotes: 23
Reputation: 1287
If your dataset is x
you can get those numbers using
summary(x)[["1st Qu."]]
and
summary(x)[["3rd Qu."]]
Then you compare against those numbers to get the numbers you want.
Upvotes: 4