Reputation: 852
I'd like to create a split violin density plot using ggplot, like the fourth example on this page of the seaborn documentation.
Here is some data:
set.seed(20160229)
my_data = data.frame(
y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
x=c(rep('a', 2000), rep('b', 2000)),
m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000))
)
I can plot dodged violins like this:
library('ggplot2')
ggplot(my_data, aes(x, y, fill=m)) +
geom_violin()
But it's hard to visually compare the widths at different points in the side-by-side distributions. I haven't been able to find any examples of split violins in ggplot - is it possible?
I found a base R graphics solution but the function is quite long and I want to highlight distribution modes, which are easy to add as additional layers in ggplot but will be harder to do if I need to figure out how to edit that function.
Upvotes: 66
Views: 46005
Reputation: 41275
It is now possible to do this with the introdataviz package using the geom_split_violin
function, which makes it really easy to create these plots. Here is a reproducible example:
set.seed(20160229)
my_data = data.frame(
y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
x=c(rep('a', 2000), rep('b', 2000)),
m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000))
)
library(ggplot2)
# devtools::install_github("psyteachr/introdataviz")
library(introdataviz)
ggplot(my_data, aes(x = x, y = y, fill = m)) +
geom_split_violin()
Created on 2022-08-24 with reprex v2.0.2
As you can see, it creates a split violin plot. If you want more information and a tutorial of this package, check the link above.
Upvotes: 5
Reputation: 519
I would also like to share my humble contribution to this post.
I have been working for quite some time already on a similar plot that I called a craviola plot (in reference to the asymmetrical "Giannini Craviola" guitar).
So a craviola plot is in essence a splitted violin plot.
By default, it looks like this:
Distributions are shown by the splitted violins. They are accompanied by boxplots, 1 box per distribution. And by a red dot symbolizing the mean of each distribution (to be clear, red dots do not always superimpose with the median line of boxplots, it just happen to be the case in my example).
This function is part of my visualization package BiocompR which you can conveniently install from the following Github repository: https://github.com/YoannPa/BiocompR
To install the package in R do: devtools::install_github("YoannPa/BiocompR")
In order to generate such plot you must use the ggcraviola()
function.
The documentation I wrote for the ggcraviola()
function already provide some examples that you can run yourself in RStudio to see its full potential:
library(BiocompR)
?ggcraviola
To reproduce the plot above you can run the following code:
library(BiocompR)
df.complete = data.frame(
Groups = rep(c('A', 'B', 'C'), each = 2000),
Conditions = rep(c('I', 'J'), each = 1000,3),
Values = c(rnorm(1000, 0), rnorm(1000, 0.5),
rnorm(1000, 3), rnorm(1000, 3.5),
rnorm(1000, -3), rnorm(1000, -3.5)))
ggcraviola(data = df.complete, lines.col = "black")
Of course you have the freedom to show/hide the boxplots and the mean (red dots) for all distributions.
ggcraviola()
works with ggplot2, so you can also add more component of customization after using the ggcraviola()
function like this:
library(BiocompR)
df.complete = data.frame(
Groups = rep(c('A', 'B', 'C'), each = 2000),
Conditions = rep(c('I', 'J'), each = 1000,3),
Values = c(rnorm(1000, 0), rnorm(1000, 0.5),
rnorm(1000, 3), rnorm(1000, 3.5),
rnorm(1000, -3), rnorm(1000, -3.5)))
ggcraviola(data = df.complete, lines.col = "black") +
ggtitle("This is a Craviola plot!") + # Add title
theme(plot.title = element_text(hjust = 0.5),
axis.text = element_text(size = 14, color = "black"),# Custom axis text
axis.title = element_text(size = 15),
legend.title = element_text(size = 13), # Change legend font size
legend.text = element_text(size = 12),
panel.background = element_blank(), # Change panel appearance
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(color = "grey"),
panel.grid.minor.y = element_line(color = "grey")) +
scale_y_continuous(expand = c(0, 0)) + #Expand fully plot panel on Y-axis
scale_fill_manual(
labels = c("Control", "Case"), # Rename conditions
values = biopalette(name = "BiocompR_cond3", mute = TRUE))
Which gives:
Hopefully someone will find the ggcraviola() function useful!
Knowing how related this topic is to the ggcraviola() function, I decided to reference it in the documentation of my package.
Upvotes: 2
Reputation: 139
@jan-jlx's solution is wonderful. For densities with thin tails, I'd like to insert a little space between the two halves of the violin so the tails are easier to tell apart. Here's a slight modification of @jan-jlx's code to do this, borrowing the nudge parameter from the gghalves package:
GeomSplitViolin <- ggplot2::ggproto(
"GeomSplitViolin",
ggplot2::GeomViolin,
draw_group = function(self,
data,
...,
# add the nudge here
nudge = 0,
draw_quantiles = NULL) {
data <- transform(data,
xminv = x - violinwidth * (x - xmin),
xmaxv = x + violinwidth * (xmax - x))
grp <- data[1, "group"]
newdata <- plyr::arrange(transform(data,
x = if (grp %% 2 == 1) xminv else xmaxv),
if (grp %% 2 == 1) y else -y)
newdata <- rbind(newdata[1, ],
newdata,
newdata[nrow(newdata), ],
newdata[1, ])
newdata[c(1, nrow(newdata)-1, nrow(newdata)), "x"] <- round(newdata[1, "x"])
# now nudge them apart
newdata$x <- ifelse(newdata$group %% 2 == 1,
newdata$x - nudge,
newdata$x + nudge)
if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))
quantiles <- ggplot2:::create_quantile_segment_frame(data,
draw_quantiles)
aesthetics <- data[rep(1, nrow(quantiles)),
setdiff(names(data), c("x", "y")),
drop = FALSE]
aesthetics$alpha <- rep(1, nrow(quantiles))
both <- cbind(quantiles, aesthetics)
quantile_grob <- ggplot2::GeomPath$draw_panel(both, ...)
ggplot2:::ggname("geom_split_violin",
grid::grobTree(ggplot2::GeomPolygon$draw_panel(newdata, ...),
quantile_grob))
}
else {
ggplot2:::ggname("geom_split_violin",
ggplot2::GeomPolygon$draw_panel(newdata, ...))
}
}
)
geom_split_violin <- function(mapping = NULL,
data = NULL,
stat = "ydensity",
position = "identity",
# nudge param here
nudge = 0,
...,
draw_quantiles = NULL,
trim = TRUE,
scale = "area",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
ggplot2::layer(data = data,
mapping = mapping,
stat = stat,
geom = GeomSplitViolin,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(trim = trim,
scale = scale,
# don't forget the nudge
nudge = nudge,
draw_quantiles = draw_quantiles,
na.rm = na.rm,
...))
}
Here's a plot I get with geom_split_violin(nudge = 0.02)
.
Upvotes: 2
Reputation: 35242
Note: I think the answer by jan-glx is much better, and most people should use that instead. But sometimes, the manual approach is still helpful to do weird things.
You can achieve this by calculating the densities yourself beforehand, and then plotting polygons. See below for a rough idea.
library(dplyr)
pdat <- my_data %>%
group_by(x, m) %>%
do(data.frame(loc = density(.$y)$x,
dens = density(.$y)$y))
pdat$dens <- ifelse(pdat$m == 'i', pdat$dens * -1, pdat$dens)
pdat$dens <- ifelse(pdat$x == 'b', pdat$dens + 1, pdat$dens)
ggplot(pdat, aes(dens, loc, fill = m, group = interaction(m, x))) +
geom_polygon() +
scale_x_continuous(breaks = 0:1, labels = c('a', 'b')) +
ylab('density') +
theme_minimal() +
theme(axis.title.x = element_blank())
Upvotes: 57
Reputation: 9496
Or, to avoid fiddling with the densities, you could extend ggplot2
's GeomViolin like this:
GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin,
draw_group = function(self, data, ..., draw_quantiles = NULL) {
data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x))
grp <- data[1, "group"]
newdata <- plyr::arrange(transform(data, x = if (grp %% 2 == 1) xminv else xmaxv), if (grp %% 2 == 1) y else -y)
newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ])
newdata[c(1, nrow(newdata) - 1, nrow(newdata)), "x"] <- round(newdata[1, "x"])
if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <=
1))
quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE]
aesthetics$alpha <- rep(1, nrow(quantiles))
both <- cbind(quantiles, aesthetics)
quantile_grob <- GeomPath$draw_panel(both, ...)
ggplot2:::ggname("geom_split_violin", grid::grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob))
}
else {
ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...))
}
})
geom_split_violin <- function(mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ...,
draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE) {
layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...))
}
And use the new geom_split_violin
like this:
ggplot(my_data, aes(x, y, fill = m)) + geom_split_violin()
Upvotes: 89