Reputation: 121
I can't find a way to ask ggplot2 to show an empty level in a boxplot without imputing my dataframe with actual missing values. Here is reproducible code :
# fake data
dftest <- expand.grid(time=1:10,measure=1:50)
dftest$value <- rnorm(dim(dftest)[1],3+0.1*dftest$time,1)
# and let's suppose we didn't observe anything at time 2
# doesn't work even when forcing with factor(..., levels=...)
p <- ggplot(data=dftest[dftest$time!=2,],aes(x=factor(time,levels=1:10),y=value))
p + geom_boxplot()
# only way seems to have at least one actual missing value in the dataframe
dftest2 <- dftest
dftest2[dftest2$time==2,"value"] <- NA
p <- ggplot(data=dftest2,aes(x=factor(time),y=value))
p + geom_boxplot()
So I guess I'm missing something. This is not a problem when dealing with a balanced experiment where these missing data might be explicit in the dataframe. But with observed data in a cohort for example, it means imputing the data with missing values for unobserved combinations.
Upvotes: 12
Views: 9842
Reputation: 179388
We can control the breaks in a suitable scale function, in this case scale_x_discrete
. Make sure you use the argument drop = FALSE
:
p <- ggplot(data = dftest[dftest$time != 2, ],
aes(x = factor(time, levels = 1:10), y = value))
p + geom_boxplot() +
scale_x_discrete("time", breaks = factor(1:10), drop = FALSE)
I like to do my data manipulation in advance of sending it to ggplot
. I think this makes the code more readable. This is how I would do it myself, but the results are the same. Note, however, that the ggplot
scale gets much simpler, since you don't have to specify the breaks:
dfplot <- dftest[dftest$time != 2, ]
dfplot$time <- factor(dfplot$time, levels = 1:10)
ggplot(data = dfplot, aes(x = time, y = value)) +
geom_boxplot() +
scale_x_discrete("time", drop = FALSE)
Upvotes: 19