user102162
user102162

Reputation: 852

Split violin plot with ggplot2

I'd like to create a split violin density plot using ggplot, like the fourth example on this page of the seaborn documentation.

enter image description here

Here is some data:

set.seed(20160229)

my_data = data.frame(
    y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
    x=c(rep('a', 2000), rep('b', 2000)),
    m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000))
)

I can plot dodged violins like this:

library('ggplot2')

ggplot(my_data, aes(x, y, fill=m)) +
  geom_violin()

enter image description here

But it's hard to visually compare the widths at different points in the side-by-side distributions. I haven't been able to find any examples of split violins in ggplot - is it possible?

I found a base R graphics solution but the function is quite long and I want to highlight distribution modes, which are easy to add as additional layers in ggplot but will be harder to do if I need to figure out how to edit that function.

Upvotes: 66

Views: 46005

Answers (5)

Quinten
Quinten

Reputation: 41275

It is now possible to do this with the introdataviz package using the geom_split_violin function, which makes it really easy to create these plots. Here is a reproducible example:

set.seed(20160229)
my_data = data.frame(
  y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
  x=c(rep('a', 2000), rep('b', 2000)),
  m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000))
)

library(ggplot2)
# devtools::install_github("psyteachr/introdataviz")
library(introdataviz)

ggplot(my_data, aes(x = x, y = y, fill = m)) +
  geom_split_violin()

Created on 2022-08-24 with reprex v2.0.2

As you can see, it creates a split violin plot. If you want more information and a tutorial of this package, check the link above.

Upvotes: 5

Yoann Pageaud
Yoann Pageaud

Reputation: 519

I would also like to share my humble contribution to this post. I have been working for quite some time already on a similar plot that I called a craviola plot (in reference to the asymmetrical "Giannini Craviola" guitar).
So a craviola plot is in essence a splitted violin plot.
By default, it looks like this: enter image description here Distributions are shown by the splitted violins. They are accompanied by boxplots, 1 box per distribution. And by a red dot symbolizing the mean of each distribution (to be clear, red dots do not always superimpose with the median line of boxplots, it just happen to be the case in my example).

This function is part of my visualization package BiocompR which you can conveniently install from the following Github repository: https://github.com/YoannPa/BiocompR
To install the package in R do: devtools::install_github("YoannPa/BiocompR")

In order to generate such plot you must use the ggcraviola() function. The documentation I wrote for the ggcraviola() function already provide some examples that you can run yourself in RStudio to see its full potential:

library(BiocompR)
?ggcraviola

To reproduce the plot above you can run the following code:

library(BiocompR)
    
df.complete = data.frame(
        Groups = rep(c('A', 'B', 'C'), each = 2000),
        Conditions = rep(c('I', 'J'), each = 1000,3),
        Values = c(rnorm(1000, 0), rnorm(1000, 0.5),
                   rnorm(1000, 3), rnorm(1000, 3.5),
                   rnorm(1000, -3), rnorm(1000, -3.5)))

ggcraviola(data = df.complete, lines.col = "black")

Of course you have the freedom to show/hide the boxplots and the mean (red dots) for all distributions. ggcraviola() works with ggplot2, so you can also add more component of customization after using the ggcraviola() function like this:

library(BiocompR)

df.complete = data.frame(
  Groups = rep(c('A', 'B', 'C'), each = 2000),
  Conditions = rep(c('I', 'J'), each = 1000,3),
  Values = c(rnorm(1000, 0), rnorm(1000, 0.5),
             rnorm(1000, 3), rnorm(1000, 3.5),
             rnorm(1000, -3), rnorm(1000, -3.5)))

ggcraviola(data = df.complete, lines.col = "black") +
  ggtitle("This is a Craviola plot!") + # Add title
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_text(size = 14, color = "black"),# Custom axis text
        axis.title = element_text(size = 15),
        legend.title = element_text(size = 13), # Change legend font size
        legend.text = element_text(size = 12),
        panel.background = element_blank(), # Change panel appearance
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_line(color = "grey"),
        panel.grid.minor.y = element_line(color = "grey")) +
  scale_y_continuous(expand = c(0, 0)) + #Expand fully plot panel on Y-axis
  scale_fill_manual(
    labels = c("Control", "Case"), # Rename conditions
    values = biopalette(name = "BiocompR_cond3", mute = TRUE))

Which gives: enter image description here Hopefully someone will find the ggcraviola() function useful!
Knowing how related this topic is to the ggcraviola() function, I decided to reference it in the documentation of my package.

Upvotes: 2

Trang Q. Nguyen
Trang Q. Nguyen

Reputation: 139

@jan-jlx's solution is wonderful. For densities with thin tails, I'd like to insert a little space between the two halves of the violin so the tails are easier to tell apart. Here's a slight modification of @jan-jlx's code to do this, borrowing the nudge parameter from the gghalves package:

GeomSplitViolin <- ggplot2::ggproto(
    "GeomSplitViolin",
    ggplot2::GeomViolin,
    draw_group = function(self,
                          data,
                          ...,
                          # add the nudge here
                          nudge = 0,
                          draw_quantiles = NULL) {
        data <- transform(data,
                          xminv = x - violinwidth * (x - xmin),
                          xmaxv = x + violinwidth * (xmax - x))
        grp <- data[1, "group"]
        newdata <- plyr::arrange(transform(data,
                                           x = if (grp %% 2 == 1) xminv else xmaxv),
                                 if (grp %% 2 == 1) y else -y)
        newdata <- rbind(newdata[1, ],
                         newdata,
                         newdata[nrow(newdata), ],
                         newdata[1, ])
        newdata[c(1, nrow(newdata)-1, nrow(newdata)), "x"] <- round(newdata[1, "x"])

        # now nudge them apart
        newdata$x <- ifelse(newdata$group %% 2 == 1,
                            newdata$x - nudge,
                            newdata$x + nudge)

        if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {

            stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))

            quantiles <- ggplot2:::create_quantile_segment_frame(data,
                                                             draw_quantiles)
            aesthetics <- data[rep(1, nrow(quantiles)),
                               setdiff(names(data), c("x", "y")),
                               drop = FALSE]
            aesthetics$alpha <- rep(1, nrow(quantiles))
            both <- cbind(quantiles, aesthetics)
            quantile_grob <- ggplot2::GeomPath$draw_panel(both, ...)
            ggplot2:::ggname("geom_split_violin",
                             grid::grobTree(ggplot2::GeomPolygon$draw_panel(newdata, ...),
                                            quantile_grob))
        }
    else {
            ggplot2:::ggname("geom_split_violin",
                             ggplot2::GeomPolygon$draw_panel(newdata, ...))
        }
    }
)

geom_split_violin <- function(mapping = NULL,
                              data = NULL,
                              stat = "ydensity",
                              position = "identity",
                              # nudge param here
                              nudge = 0,
                              ...,
                              draw_quantiles = NULL,
                              trim = TRUE,
                              scale = "area",
                              na.rm = FALSE,
                              show.legend = NA,
                              inherit.aes = TRUE) {

    ggplot2::layer(data = data,
                   mapping = mapping,
                   stat = stat,
                   geom = GeomSplitViolin,
                   position = position,
                   show.legend = show.legend,
                   inherit.aes = inherit.aes,
                   params = list(trim = trim,
                                 scale = scale,
                                 # don't forget the nudge
                                 nudge = nudge,
                                 draw_quantiles = draw_quantiles,
                                 na.rm = na.rm,
                                 ...))
}

Here's a plot I get with geom_split_violin(nudge = 0.02).

enter image description here

Upvotes: 2

Axeman
Axeman

Reputation: 35242

Note: I think the answer by jan-glx is much better, and most people should use that instead. But sometimes, the manual approach is still helpful to do weird things.


You can achieve this by calculating the densities yourself beforehand, and then plotting polygons. See below for a rough idea.

Get densities

library(dplyr)
pdat <- my_data %>%
  group_by(x, m) %>%
  do(data.frame(loc = density(.$y)$x,
                dens = density(.$y)$y))

Flip and offset densities for the groups

pdat$dens <- ifelse(pdat$m == 'i', pdat$dens * -1, pdat$dens)
pdat$dens <- ifelse(pdat$x == 'b', pdat$dens + 1, pdat$dens)

Plot

ggplot(pdat, aes(dens, loc, fill = m, group = interaction(m, x))) + 
  geom_polygon() +
  scale_x_continuous(breaks = 0:1, labels = c('a', 'b')) +
  ylab('density') +
  theme_minimal() +
  theme(axis.title.x = element_blank())

Result

enter image description here

Upvotes: 57

jan-glx
jan-glx

Reputation: 9496

Or, to avoid fiddling with the densities, you could extend ggplot2's GeomViolin like this:

GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin, 
                           draw_group = function(self, data, ..., draw_quantiles = NULL) {
  data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x))
  grp <- data[1, "group"]
  newdata <- plyr::arrange(transform(data, x = if (grp %% 2 == 1) xminv else xmaxv), if (grp %% 2 == 1) y else -y)
  newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ])
  newdata[c(1, nrow(newdata) - 1, nrow(newdata)), "x"] <- round(newdata[1, "x"])

  if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
    stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <=
      1))
    quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
    aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE]
    aesthetics$alpha <- rep(1, nrow(quantiles))
    both <- cbind(quantiles, aesthetics)
    quantile_grob <- GeomPath$draw_panel(both, ...)
    ggplot2:::ggname("geom_split_violin", grid::grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob))
  }
  else {
    ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...))
  }
})

geom_split_violin <- function(mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., 
                              draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, 
                              show.legend = NA, inherit.aes = TRUE) {
  layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin, 
        position = position, show.legend = show.legend, inherit.aes = inherit.aes, 
        params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...))
}

And use the new geom_split_violin like this:

ggplot(my_data, aes(x, y, fill = m)) + geom_split_violin()

enter image description here

Upvotes: 89

Related Questions