Snoop Dogg
Snoop Dogg

Reputation: 391

Creating violin plots on ggplot2 for vectors of different length?

I have two vectors of different length, and want to create a violin plots for them. What I am currently doing is to cbind them, which makes the shorter vector to be repeated until it matches the length of the longer vector (by default done by cbind in R).

library(ggplot2)

C1 <- rnorm(100)
C2 <- rnorm(500)

dat <- cbind(C1,C2)

# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
pp <- ggplot(mat, aes(x = variable, y = value)) + geom_violin(scale="width",adjust = 1,width = 0.5,fill = "gray80")
pp

Would this affect the shape of the violin? Is there a more correct way of creating the violin plots without having to artificially increase the length of one of them?

Upvotes: 1

Views: 1907

Answers (2)

Cyrille
Cyrille

Reputation: 3597

If you're interested in the relative distributions and not the magnitude then @camille's answer is ideal.

Consider however that you may wish, or it may be more appropriate, to show the relative sizes of the two vector lengths in the violin plot.

There are other ways but the best that I found is ggbeeswarm.

Prep data using camille's code:

library(ggplot2)

set.seed(710)
C1 <- data.frame(value = rnorm(100), variable = "C1")
C2 <- data.frame(value = rnorm(500), variable = "C2")

dat <- rbind(C1, C2)

Example 1

ggplot(dat, aes(x=value, y=variable, col=as.factor(variable))) +
  geom_quasirandom(groupOnX = FALSE, varwidth = TRUE)

geom_quasirandom

Example 2

ggplot(dat, aes(x=value, y=variable, col=as.factor(variable))) +
  geom_beeswarm(groupOnX = FALSE)

geom_beeswarm

There's lots of other options - see the documentation.

Are these still violin plots? Perhaps not, but they represent the data more clearly to the viewer.

Upvotes: 1

camille
camille

Reputation: 16832

Rather than cbinding two vectors with different lengths, which will cause recycling, and then melting, make two data frames where you mark what each represents and rbind them. That way you start out with data in the shape that ggplot expects, and don't run the risk of repeating values from the shorter of the two sets of data.

library(ggplot2)

set.seed(710)
C1 <- data.frame(value = rnorm(100), variable = "C1")
C2 <- data.frame(value = rnorm(500), variable = "C2")

dat <- rbind(C1, C2)
head(dat)
#>         value variable
#> 1 -0.97642446       C1
#> 2 -0.51938107       C1
#> 3  1.05793223       C1
#> 4 -0.88139935       C1
#> 5 -0.05997154       C1
#> 6  0.31960235       C1

ggplot(dat, aes(x = variable, y = value)) +
  geom_violin(scale = "width", adjust = 1, width = 0.5)

Created on 2018-07-11 by the reprex package (v0.2.0).

Upvotes: 3

Related Questions