Reputation: 61
I want to gradient fill a violin plot based on the density of points in the bins (blue for highest density and red for lowest).
I have generated a plot using the following commands but failed to color it based on density (in this case the width of the violin. I also would like to generate box plots with similar coloring).
library("ggplot2")
data(diamonds)
ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin()
Upvotes: 6
Views: 3405
Reputation: 147
Following up to the answer from @tjebo and the request from @CyG I simplified the code for generating a violin plot with gradient fill. To do this i re-used geom_violin
to generate the outline and adjusted the mywidth
parameter to compensate for the difference in width.
The only tricky part was to ensure that separate violin plots are generated and that the x-axis appears discrete, even though the x-axis is continuous (required for geom_segment). To do this I set the fill aesthetics of geom_violin
to the categorical variable cut
and converted cut
to an integer to set x
in aes()
. (Source: change x location of violin plot in ggplot2)
library(tidyverse)
library(viridisLite)
mywidth <- .45 # bit of trial and error
df_input <- diamonds
p <- ggplot(df_input, aes(x=cut,y=carat)) + geom_violin()
# all you need for the gradient fill
vl_fill <- data.frame(ggplot_build(p)$data) %>%
mutate(xnew = x - mywidth * violinwidth, xend = x + mywidth * violinwidth)
breaks <- unique(as.integer(df_input$cut))
labels <- unique(df_input$cut)
ggplot() +
geom_segment(data = vl_fill, aes(x = xnew, xend = xend, y = y, yend = y,
color = violinwidth), show.legend = FALSE) +
# Re-use geom_violin to plot the outline
geom_violin(data = df_input, aes(x = as.integer(cut), y = carat, fill = cut),
color = "white", alpha = 0, draw_quantiles = c(0.25, 0.5, 0.75),
show.legend = FALSE) +
scale_x_continuous(breaks = breaks, labels = labels) +
scale_color_viridis_c() +
labs(x = "Cut", y = "Carat")
=== UPDATE - Parametrized Version ===
To visualize arbitrary parameters as color (instead of just violinwidth
), I obtained the range of y values (here: y = carat) from one fill segment to the next, scaled the range up, and calculated the average of the target value (here: price) within this y range. diff_multiplier
parametrizes how much the y range gets scaled up. Higher diff_multiplier
includes more values to be averaged per horizontal segment, therefore smoothing the gradient and reducing missingness, but also increases computational effort.
This code is a bit hacky to be honest and I appreciate any suggestions on how to optimize the runtime (perhaps by vectorizing the for loop).
library(tidyverse)
library(viridisLite)
theme_set(theme_light(base_size = 14))
df_input <- diamonds
p <- ggplot(df_input, aes(x=cut,y=carat)) + geom_violin()
mywidth <- .45 # bit of trial and error
diff_multiplier <- 5
mean_range_value <- function (df, y_low, y_high, current_group = NULL) {
df %>%
mutate(y = carat,
group = as.integer(cut),
value = price) %>%
select(y, group, value) %>%
filter(y > y_low & y <= y_high & group == current_group) %>%
pull(value) %>%
mean(., na.rm = TRUE)
}
mean_range_value2 <- function(df, y_low, y_high, current_group = NULL) {
if (length(y_low) != length(y_high)) stop("y_low and y_high don't have equal length")
results <- list()
for (i in seq_along(y_low)) {
results[i] <- mean_range_value(df, y_low[i], y_high[i], current_group = current_group)
}
return(as.double(as.vector(results)))
}
# all you need for the gradient fill
vl_fill <- data.frame(ggplot_build(p)$data) %>%
mutate(xnew = x - mywidth * violinwidth, xend = x + mywidth * violinwidth) %>%
group_by(group) %>%
mutate(y_diff = c(diff(y, 1), last(diff(y, 1))) * diff_multiplier,
y_low = y - y_diff/2,
y_high = y + y_diff/2) %>%
mutate(price = mean_range_value2(df_input, y_low, y_high, first(group))) %>%
mutate(price = if_else(is.finite(price), price, NA))
breaks <- unique(as.integer(df_input$cut))
labels <- unique(df_input$cut)
ggplot() +
geom_segment(data = vl_fill, aes(x = xnew, xend = xend, y = y, yend = y,
color = price)) +
geom_violin(data = df_input, aes(x = as.integer(cut), y = carat, fill = cut),
color = "grey", alpha = 0, draw_quantiles = c(0.25, 0.5, 0.75),
show.legend = FALSE) +
scale_x_continuous(breaks = breaks, labels = labels) +
scale_color_viridis_c() +
labs(x = "Cut", y = "Carat", color = "Price") +
theme(legend.position = "bottom")
Upvotes: 1
Reputation: 23807
Just answered this for another thread, but believe it's possibly more appropriate for this thread. You can create a pseudo-fill by drawing many segments. You can get those directly from the underlying data in the ggplot_built object.
If you want an additional polygon outline ("border"), you'd need to create this from the x/y coordinates. Below one option.
library(tidyverse)
p <- ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin()
mywidth <- .35 # bit of trial and error
# all you need for the gradient fill
vl_fill <- data.frame(ggplot_build(p)$data) %>%
mutate(xnew = x- mywidth*violinwidth, xend = x+ mywidth*violinwidth)
# the outline is a bit more convoluted, as the order matters
vl_poly <- vl_fill %>%
select(xnew, xend, y, group) %>%
pivot_longer(-c(y, group), names_to = "oldx", values_to = "x") %>%
arrange(y) %>%
split(., .$oldx) %>%
map(., function(x) {
if(all(x$oldx == "xnew")) x <- arrange(x, desc(y))
x
}) %>%
bind_rows()
ggplot() +
geom_polygon(data = vl_poly, aes(x, y, group = group),
color= "black", size = 1, fill = NA) +
geom_segment(data = vl_fill, aes(x = xnew, xend = xend, y = y, yend = y,
color = violinwidth))
Created on 2021-04-14 by the reprex package (v1.0.0)
Upvotes: 3
Reputation: 4102
to change the colour of the violin plot you use fill = variable, like this:
ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin(aes(fill=cut))
same goes for boxplot
ggplot(diamonds, aes(x=cut,y=carat)) + geom_boxplot(aes(fill=cut))
but whatever value you have has to have the same value for each cut, that is, if you wanted to use for example mean depth/cut as the color variable you would have to code it.
with dplyr group your diamonds by cut and with summarize get the mean depth (or any other variable)
library(dplyr)
diamonds_group <- group_by(diamonds, cut)
diamonds_group <- summarize(diamonds_group, Mean_Price = mean(price))
Then I used diamonds2 as a copy of diamonds to then manipulate the dataset
diamonds2 <- diamonds
I merge both dataframes to get the Mean_Depth as a variable in diamonds2
diamonds2 <- merge(diamonds2, diamonds_group)
And now I can plot it with mean depth as a color variable
ggplot(diamonds2, aes(x=cut,y=carat)) + geom_boxplot(aes(fill=Mean_Price)) + scale_fill_gradient2(midpoint = mean(diamonds2$price))
Upvotes: 2