Reputation: 391
I have two vectors of different length, and want to create a violin plots for them. What I am currently doing is to cbind
them, which makes the shorter vector to be repeated until it matches the length of the longer vector (by default done by cbind
in R).
library(ggplot2)
C1 <- rnorm(100)
C2 <- rnorm(500)
dat <- cbind(C1,C2)
# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
pp <- ggplot(mat, aes(x = variable, y = value)) + geom_violin(scale="width",adjust = 1,width = 0.5,fill = "gray80")
pp
Would this affect the shape of the violin? Is there a more correct way of creating the violin plots without having to artificially increase the length of one of them?
Upvotes: 1
Views: 1907
Reputation: 3597
If you're interested in the relative distributions and not the magnitude then @camille's answer is ideal.
Consider however that you may wish, or it may be more appropriate, to show the relative sizes of the two vector lengths in the violin plot.
There are other ways but the best that I found is ggbeeswarm.
Prep data using camille's code:
library(ggplot2)
set.seed(710)
C1 <- data.frame(value = rnorm(100), variable = "C1")
C2 <- data.frame(value = rnorm(500), variable = "C2")
dat <- rbind(C1, C2)
ggplot(dat, aes(x=value, y=variable, col=as.factor(variable))) +
geom_quasirandom(groupOnX = FALSE, varwidth = TRUE)
ggplot(dat, aes(x=value, y=variable, col=as.factor(variable))) +
geom_beeswarm(groupOnX = FALSE)
There's lots of other options - see the documentation.
Are these still violin plots? Perhaps not, but they represent the data more clearly to the viewer.
Upvotes: 1
Reputation: 16832
Rather than cbind
ing two vectors with different lengths, which will cause recycling, and then melting, make two data frames where you mark what each represents and rbind
them. That way you start out with data in the shape that ggplot
expects, and don't run the risk of repeating values from the shorter of the two sets of data.
library(ggplot2)
set.seed(710)
C1 <- data.frame(value = rnorm(100), variable = "C1")
C2 <- data.frame(value = rnorm(500), variable = "C2")
dat <- rbind(C1, C2)
head(dat)
#> value variable
#> 1 -0.97642446 C1
#> 2 -0.51938107 C1
#> 3 1.05793223 C1
#> 4 -0.88139935 C1
#> 5 -0.05997154 C1
#> 6 0.31960235 C1
ggplot(dat, aes(x = variable, y = value)) +
geom_violin(scale = "width", adjust = 1, width = 0.5)
Created on 2018-07-11 by the reprex package (v0.2.0).
Upvotes: 3