5th
5th

Reputation: 2375

compare boxplots with a single value

I want to compare the distribution of several variables (here X1 and X2) with a single value (here bm). The issue is that these variables are too many (about a dozen) to use a single boxplot. enter image description here

Additionaly the levels are too different to use one plot. I need to use facets to make things more organised: enter image description here

However with this plot my benchmark category (bm), which is a single value in X1 and X2, does not appear in X1 and seems to have several values in X2. I want it to be only this green line, which it is in the first plot. Any ideas why it changes? Is there any good workaround? I tried the options of facet_wrap/facet_grid, but nothing there delivered the right result.

I also tried combining a bar plot with bm and three empty categories with the boxplot. But firstly it looked terrible and secondly it got similarly screwed up in the facetting. Basically any work around would help.

Below the code to create the minimal example displayed here:

# Creating some sample data & loading libraries
library(ggplot2)
library(RColorBrewer)
set.seed(10111)
x=matrix(rnorm(40),20,2)
y=rep(c(-1,1),c(10,10))
x[y==1,]=x[y==1,]+1
x[,2]=x[,2]+20
df=data.frame(x,y)

# creating a benchmark point
benchmark=data.frame(y=rep("bm",2),key=c("X1","X2"),value=c(-0.216936,20.526312))
# melting the data frame, rbinding it with the benchmark
test_dat=rbind(tidyr::gather(df,key,value,-y),benchmark)

# Creating a plot
p_box <- ggplot(data = test_dat, aes(x=key, y=value,color=as.factor(test_dat$y))) +
    geom_boxplot() +  scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1"))

# The first line delivers the first plot, the second line the second plot
p_box
p_box + facet_wrap(~key,scales = "free",drop = FALSE) + theme(legend.position = "bottom")

Upvotes: 0

Views: 835

Answers (1)

GGamba
GGamba

Reputation: 13680

The problem only lies int the use of test_dat$y inside the color aes. Never use $ in aes, ggplot will mess up.

Anyway, I think you plot would improve if you use a geom_hline for the benchmark, instead of hacking in a single value boxplot:

library(ggplot2)
library(RColorBrewer)

ggplot(tidyr::gather(df,key,value,-y)) +
    geom_boxplot(aes(x=key, y=value, color=as.factor(y))) +
    geom_hline(data = benchmark, aes(yintercept = value), color = '#4DAF4A', size = 1) +
    scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1")) +
    facet_wrap(~key,scales = "free",drop = FALSE) + 
    theme(legend.position = "bottom")

Upvotes: 1

Related Questions