Jonathan
Jonathan

Reputation: 846

Issue with boxplot factors

I'm having an issue when trying to make side by side boxplots by factors. I've read several examples, but for some reason my plots are not displaying correctly. I think it's trying to plot a boxplot for each value, even though I specified it as a factor.

I'm using the following code:

samp.norm = rnorm(1000,0,1)
samp.exp  = rexp(1000,1)
samp.unif = runif(1000)
samp = c(samp.norm,samp.exp,samp.unif)
dist = c( rep("norm",1000), rep("exp",1000), rep("unif",1000) )
DATA = as.data.frame(cbind(samp,dist))
DATA$dist= as.factor(DATA$dist)
p = ggplot(DATA, aes(x=factor(DATA$dist), y = DATA$samp)) + geom_boxplot()
p

Upvotes: 1

Views: 1517

Answers (2)

Peter Ellis
Peter Ellis

Reputation: 5894

The problem is your use of cbind() coerces the resulting object so that DATA$samp is a factor rather than numeric. The columns resulting from cbind need to have the same class, which means they go for the lowest common demoninator class in this case "character". This is exactly what data frames were invented to solve.

Try

DATA=data.frame(samp,dist) 

instead of the more complicated line you've got and it all should work.

As an aside, you also should have the much simpler

p=ggplot(DATA, aes(x=dist, y = samp)) + geom_boxplot()

rather than your second-last line. Once you have specified to ggplot() you are using DATA, you don't need to tell it where to find dist and samp ie no need for DATA$dist, just dist. Also, as dist is already a factor, you don't need to specify factor(dist).

Upvotes: 3

gung - Reinstate Monica
gung - Reinstate Monica

Reputation: 11893

+1 to @PeterEllis. Note that you can also get even simpler than his suggestion with:

boxplot(samp~dist)

enter image description here

Upvotes: 0

Related Questions