Reputation: 1327
I'm trying to produce a variation of a grouped violin plot in R (preferably using ggplot2
), similar to the one below:
which was produced by the following reproducible example code:
# Load libraries #
library(tidyverse)
# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100),
Y = rgamma(n = 200, shape = 2, rate = 2),
Z = rep(c("Za", "Zb"), rep = 100),
stringsAsFactors = FALSE)
# Grouped violin plot #
df %>%
ggplot(., aes(x = X, y = Y, fill = Z)) +
geom_violin(draw_quantiles = 0.5) +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))
The variation I'd like to have is that the density above the median should have a different shade compared to the density below the median, as in the following plot:
I produced the above (single) violin plot for the combination X = X1
and Z = Za
in the data, using the following code:
## Shaded violin plot ##
# Calculate limits and median #
df.lim <- df %>%
filter(X == "X1", Z == "Za") %>%
summarise(Y_min = min(Y),
Y_qnt = quantile(Y, 0.5),
Y_max = max(Y))
# Calculate density, truncate at limits and assign shade category #
df.dens <- df %>%
filter(X == "X1", Z == "Za") %>%
do(data.frame(LOC = density(.$Y)$x,
DENS = density(.$Y)$y)) %>%
filter(LOC >= df.lim$Y_min, LOC <= df.lim$Y_max) %>%
mutate(COL = ifelse(LOC > df.lim$Y_qnt, "Empty", "Filled"))
# Find density values at limits #
df.lim.2 <- df.dens %>%
filter(LOC == min(LOC) | LOC == max(LOC))
# Produce shaded single violin plot #
df.dens %>%
ggplot(aes(x = LOC)) +
geom_area(aes(y = DENS, alpha = COL), fill = "red") +
geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
geom_path(aes(y = DENS)) +
geom_path(aes(y = -DENS)) +
geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
coord_flip() +
scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))
As you will notice in the code, I'm building the violin plot from scratch using the density
function horizontally and then flipping the axes. The problem arises when I try to produce a grouped violin plot mainly because the axis in which the groups X
and Z
will appear, is already used for the "height" of the density. I did try to reach the same result by repeating all the calculations by groups but I'm stuck in the final step:
## Shaded grouped violin plot ##
# Calculate limits and median by group #
df.lim <- df %>%
group_by(X, Z) %>%
summarise(Y_min = min(Y),
Y_qnt = quantile(Y, 0.5),
Y_max = max(Y))
# Calculate density, truncate at limits and assign shade category by group #
df.dens <- df %>%
group_by(X, Z) %>%
do(data.frame(LOC = density(.$Y)$x,
DENS = density(.$Y)$y)) %>%
left_join(., df.lim, by = c("X", "Z")) %>%
filter(LOC >= Y_min, LOC <= Y_max) %>%
mutate(COL = ifelse(LOC > Y_qnt, "Empty", "Filled"))
# Find density values at limits by group #
df.lim.2 <- df.dens %>%
group_by(X, Z) %>%
filter(LOC == min(LOC) | LOC == max(LOC))
# Produce shaded grouped violin plot #
df.dens %>%
ggplot(aes(x = LOC, group = interaction(X, Z))) +
# The following two lines don't work when included #
#geom_area(aes(y = DENS, alpha = COL), fill = "red") +
#geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
geom_path(aes(y = DENS)) +
geom_path(aes(y = -DENS)) +
geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
coord_flip() +
scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))
Running the code above will produce the outline of the violin plots for each group, each one on top of the other. But once I try to include the geom_area
lines, the code fails.
My gut feeling tells me that I would need to somehow produce the "shaded" violin plot as a new geom
which can then be used under the general structure of ggplot2
graphics but I have no idea how to do that, as my coding skills don't extend that far. Any help or pointers, either along my line of thought or in a different direction would be much appreciated. Thank you for your time.
Upvotes: 5
Views: 788
Reputation: 20329
Idea
For the fun of it, I hacked a quick half-violin geom. It is basically a lot of copy & paste from GeomViolin
and in order to make it run I had to access some of the internal ggplot2
function, which are not exported via :::
which means that this solution may not run in the future (if the ggplot
team decides to change their internal functions).
However, this solution works and you can specify the alpha level of both the upper and the lower part. The geom
assumes that you are providing just one quantile. The code is only superficially tested but it gives you an idea of how this can be done. As said it is in large part a simple copy & paste from GeomViolin
where I added some code which finds out which values are below and above the quantile and splits the underlying GeomPolygon
in 2 parts, as this function uses just a single alpha value. It works with groups
and coord_flip
likewise.
Code
library(grid)
GeomHalfViolin <- ggproto("GeomHalfViolin", GeomViolin,
draw_group = function (self, data, ..., draw_quantiles = NULL,
alpha_upper = .5, alpha_lower = 1) {
data <- transform(data, xminv = x - violinwidth * (x - xmin),
xmaxv = x + violinwidth * (xmax - x))
newdata <- rbind(transform(data, x = xminv)[order(data$y),
], transform(data, x = xmaxv)[order(data$y, decreasing = TRUE),
])
newdata <- rbind(newdata, newdata[1, ])
if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <=
1))
stopifnot(length(draw_quantiles) <= 1)
## need to add ggplot::: to access ggplot2 internal functions here and there
quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
###------------------------------------------------
## find out where the quantile is supposed to be
quantile_line <- unique(quantiles$y)
## which y values are below this quantile?
ind <- newdata$y <= quantile_line
## set the alpha values accordingly
newdata$alpha[!ind] <- alpha_upper
newdata$alpha[ind] <- alpha_lower
###------------------------------------------------
aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data),
c("x", "y", "group")), drop = FALSE]
aesthetics$alpha <- rep(1, nrow(quantiles))
both <- cbind(quantiles, aesthetics)
both <- both[!is.na(both$group), , drop = FALSE]
quantile_grob <- if (nrow(both) == 0) {
zeroGrob()
}
else {
GeomPath$draw_panel(both, ...)
}
###------------------------------------------------
## GeomPolygon uses a single alpha value by default
## Hence, split the violin in two parts
ggplot2:::ggname("geom_half_violin",
grobTree(GeomPolygon$draw_panel(newdata[ind, ], ...),
GeomPolygon$draw_panel(newdata[!ind, ], ...),
quantile_grob))
###------------------------------------------------
}
else {
ggplot2:::ggname("geom_half_violin", GeomPolygon$draw_panel(newdata,
...))
}
}
)
geom_half_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
position = "dodge", ..., draw_quantiles = NULL,
alpha_upper = .5, alpha_lower = 1,
trim = TRUE, scale = "area",
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {
layer(data = data, mapping = mapping, stat = stat, geom = GeomHalfViolin,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles,
alpha_lower = alpha_lower, alpha_upper = alpha_upper,
na.rm = na.rm, ...))
}
library(tidyverse)
# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100),
Y = rgamma(n = 200, shape = 2, rate = 2),
Z = rep(c("Za", "Zb"), rep = 100),
stringsAsFactors = FALSE)
# Grouped violin plot #
df %>%
ggplot(., aes(x = X, y = Y, fill = Z)) +
geom_half_violin(draw_quantiles = 0.5, alpha_upper = .1) +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))
# no groups
df %>% filter(Z == "Za") %>%
ggplot(., aes(x = X, y = Y)) +
geom_half_violin(draw_quantiles = 0.5, alpha_upper = .1, fill = "red") +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue")) +
coord_flip()
Graphs
Upvotes: 2