Terrence J
Terrence J

Reputation: 181

How to create a boxplot in R, with box representing the 15th and 85th percentiles, rather than the default 25th and 75th?

I am just using this built-in dataset to explain what I would like to do, as my data is is essentially the same. The standard boxplot using bwplot obviously plots the 25th and 75th precentiles as the top and bottom of the boxes as standard.

Is there a way for me to alter the boxplot so that the top and bottom of the boxes are instead the 85th and 15th percentiles of each factor? If this is not possible - is there a way to represent them as lines above each factor respectively?

    library(MASS)
    data <- ChickWeight[,c("Diet", "weight")] 
    bwplot(data$weight~data$Diet)

enter image description here

I would additionally like to be able to plot a static range as a background of the plot (for example a shaded area between 150 and 250) - how can I do this?

Apologies in advance, as I am fairly new to R. Really appreciate any help for this simple task, I'm finding a lot of the R documentation a bit hard to follow.

Upvotes: 1

Views: 1467

Answers (2)

Jeff
Jeff

Reputation: 738

bwplot uses the function fivenum in order to generate the boxplot quantiles. You can change this by making your own version of the function boxplot.stats. Here I only changed one line, which you can see is commented out. You can add the desired shading in the my.panel function:

library(MASS)
library(lattice)

my.boxplot.stats <- function (x, coef = 1.5, do.conf = TRUE, do.out = TRUE) {
if (coef < 0) 
    stop("'coef' must not be negative")
nna <- !is.na(x)
n <- sum(nna)
#stats <- stats::fivenum(x, na.rm = TRUE)
stats <- quantile(x, probs = c(0.0, 0.15, 0.5, 0.85, 1.0), na.rm = TRUE)
iqr <- diff(stats[c(2, 4)])
if (coef == 0) 
    do.out <- FALSE
else {
    out <- if (!is.na(iqr)) {
                x < (stats[2L] - coef * iqr) | x > (stats[4L] + coef * 
                            iqr)
            }
            else !is.finite(x)
    if (any(out[nna], na.rm = TRUE)) 
        stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
}
conf <- if (do.conf) 
    stats[3L] + c(-1.58, 1.58) * iqr/sqrt(n)
list(stats = stats, n = n, conf = conf, out = if (do.out) x[out & 
                                    nna] else numeric())
}

my.panel <- function (x,y,...) {
    panel.rect(xleft = 0, xright = 5, ybottom = 150, ytop = 250, col="lightgrey", border = 0)
panel.bwplot(x, y, ...)
}

data <- ChickWeight[,c("Diet", "weight")] 
bwplot(data$weight~data$Diet, stats = my.boxplot.stats, panel= my.panel)

enter image description here

Upvotes: 2

thc
thc

Reputation: 9705

Check out the answer here: Adding Different Percentiles in boxplots in R

Basically, you need to first calculate the quantiles manually and add line segments manually. This is the most robust solution.

Alternatively, you can do this quick with ggplot2 without the outliers. If you really need to, you can add the outliers back in manually.

library(ggplot2)
library(MASS)
library(plyr)
data <- ChickWeight[,c("Diet", "weight")] 

data2 <- ddply(data,.(Diet),
            summarize,
            ymin = min(weight),
            ymax = max(weight),
            middle = median(weight),
            lower = quantile(weight,0.15),
            upper = quantile(weight,0.85))


ggplot(data2,aes(x = Diet)) + geom_boxplot(aes(ymin = ymin,ymax = ymax,middle = middle,upper = upper,lower= lower), stat = 'identity')

Upvotes: 1

Related Questions