Reputation: 23
I'm trying to verify the upper and lower limits of the boxplot statistics (i.e. the end of the whiskers) by comparing it to the formula, Q3+(1.5IQR) and Q1-(1.5IQR).
Each time I iterate the following code, it always returns a small difference between the boxplot statistic and the formula.
Shouldn't these numbers be identical? Why the deviation?
# random normal distribution
df <- rnorm(500)
# convert to dataframe
df <- as.data.frame(df)
# boxplot statistics
s <- boxplot.stats(df$df)
s$stats
# Upper limit of whisker: Q3+(1.5*IQR)
s$stats[4]+(1.5*(IQR(df$df)))
# Lower limit of whisker: Q1-(1.5*IQR)
s$stats[2]-(1.5*(IQR(df$df)))
Upvotes: 2
Views: 1008
Reputation: 160447
The whiskers extend out to the data that is at or inside Q3+(1.5*IQR)
. Meaning, go out to Q3*(1.5*IQR)
, and then pull it back until it hits data.
We can find those values with:
set.seed(42)
vec <- rnorm(500)
st <- boxplot.stats(vec)
st$stats
# [1] -2.46133548 -0.66263842 -0.03797064 0.63573211 2.45959355
### ,--- data
### | ,--- that is at or inside
### | | ,--- this number
### ,-, v ,----^---------------------,
max(vec[ vec < st$stats[4]+(1.5*(IQR(vec))) ])
# [1] 2.459594
min(vec[ vec > st$stats[2]-(1.5*(IQR(vec))) ])
# [1] -2.461335
Upvotes: 2