Reputation: 43
I am making a boxplot using geom_boxplot in ggplot2. However, I found the whiskers length is not correct and I don't know why. Here is my data:
value = c(1.3739117,0.8709891,3.4510461,0.8470309,1.4838725,0.6942611,1.3095816,3.0444649,19.2785424,1.0866242,0.9376845,2.2343836, 20.7975509, 20.3102489, 18.0046679,1.4197519)
data = data.frame(value)
ggplot(data, aes(y = value)) +
stat_boxplot(geom = "errorbar", width = 0.3) +
geom_boxplot(width = 0.5)
And I see the plot like this:
The 3rd quantile is overlapped with the upper whisker. I did the calculation manually, and the result is as following:
summary(data)
Min. : 0.6943
1st Qu.: 1.0494
Median : 1.4518
Mean : 6.0715
3rd Qu.: 7.0895
Max. :20.7976
Based on the explanation of geom_boxplot: The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge.
The IQR in my case is: 7.0895-1.0494 = 6.0401
The lower whisker should be: 0.6943 - 1.5*6.0401 = -8.36585
The upper whisker should be: 7.0895 + 1.5*6.0401 = 16.14965
I understand the negative lower whisker is meaningless, so here it is replaced by the min value. But why the upper whisker is not shown? I am so confused and I could not find an example online to solve this problem. Something I misunderstand about ggplot settings? I would really appreciate to your help and suggestions!
Upvotes: 4
Views: 891
Reputation: 66415
From the quoted section:
The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles).
By "value" they mean from among the original data points. If you plot the data, there are no values between the top hinge at 7.09 and 16.15 (+1.5*IQR). If these quartiles had arisen from data with one of the values lying in that range, the upper whisker would go there.
ggplot(data, aes(y = value)) +
geom_jitter(aes(x = 0.5), width = 0.05) +
stat_boxplot(geom = "errorbar", width = 0.3,
color = "red", size = 1.5) +
geom_boxplot(width = 0.5, alpha = 0.5) +
geom_hline(yintercept = c(7.09, 16.15), lty = "dashed")
Upvotes: 5