user2438149
user2438149

Reputation: 61

gradient fill violin plots using ggplot2

I want to gradient fill a violin plot based on the density of points in the bins (blue for highest density and red for lowest).

I have generated a plot using the following commands but failed to color it based on density (in this case the width of the violin. I also would like to generate box plots with similar coloring).

library("ggplot2")
data(diamonds)

ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin() 

Upvotes: 6

Views: 3405

Answers (3)

eheller
eheller

Reputation: 147

Following up to the answer from @tjebo and the request from @CyG I simplified the code for generating a violin plot with gradient fill. To do this i re-used geom_violin to generate the outline and adjusted the mywidth parameter to compensate for the difference in width.

The only tricky part was to ensure that separate violin plots are generated and that the x-axis appears discrete, even though the x-axis is continuous (required for geom_segment). To do this I set the fill aesthetics of geom_violin to the categorical variable cut and converted cut to an integer to set x in aes(). (Source: change x location of violin plot in ggplot2)

library(tidyverse)
library(viridisLite)

mywidth <- .45 # bit of trial and error

df_input <- diamonds
p <- ggplot(df_input, aes(x=cut,y=carat)) + geom_violin()

# all you need for the gradient fill
vl_fill <- data.frame(ggplot_build(p)$data) %>%
  mutate(xnew = x - mywidth * violinwidth, xend = x + mywidth * violinwidth)

breaks <- unique(as.integer(df_input$cut))
labels <- unique(df_input$cut)

ggplot() +
  geom_segment(data = vl_fill, aes(x = xnew, xend = xend, y = y, yend = y,
                                   color = violinwidth), show.legend = FALSE) +
  # Re-use geom_violin to plot the outline
  geom_violin(data = df_input, aes(x = as.integer(cut), y = carat, fill = cut),
              color = "white", alpha = 0, draw_quantiles = c(0.25, 0.5, 0.75),
              show.legend = FALSE) +
  scale_x_continuous(breaks = breaks, labels = labels) +
  scale_color_viridis_c() +
  labs(x = "Cut", y = "Carat")

Violin Plot with Gradient Fill

=== UPDATE - Parametrized Version ===

To visualize arbitrary parameters as color (instead of just violinwidth), I obtained the range of y values (here: y = carat) from one fill segment to the next, scaled the range up, and calculated the average of the target value (here: price) within this y range. diff_multiplier parametrizes how much the y range gets scaled up. Higher diff_multiplier includes more values to be averaged per horizontal segment, therefore smoothing the gradient and reducing missingness, but also increases computational effort.

This code is a bit hacky to be honest and I appreciate any suggestions on how to optimize the runtime (perhaps by vectorizing the for loop).

library(tidyverse)
library(viridisLite)
theme_set(theme_light(base_size = 14))

df_input <- diamonds

p <- ggplot(df_input, aes(x=cut,y=carat)) + geom_violin()

mywidth <- .45 # bit of trial and error
diff_multiplier <- 5

mean_range_value <- function (df, y_low, y_high, current_group = NULL) {
  df %>%
    mutate(y = carat,
           group = as.integer(cut),
           value = price) %>%
    select(y, group, value) %>%
    filter(y > y_low & y <= y_high & group == current_group) %>%
    pull(value) %>%
    mean(., na.rm = TRUE)
}

mean_range_value2 <- function(df, y_low, y_high, current_group = NULL) {
  if (length(y_low) != length(y_high)) stop("y_low and y_high don't have equal length")

  results <- list()
  for (i in seq_along(y_low)) {
    results[i] <- mean_range_value(df, y_low[i], y_high[i], current_group = current_group)
  }
  return(as.double(as.vector(results)))
}

# all you need for the gradient fill
vl_fill <- data.frame(ggplot_build(p)$data) %>%
  mutate(xnew = x - mywidth * violinwidth, xend = x + mywidth * violinwidth) %>%
  group_by(group) %>%
  mutate(y_diff = c(diff(y, 1), last(diff(y, 1))) * diff_multiplier,
         y_low = y - y_diff/2,
         y_high = y + y_diff/2) %>%
  mutate(price = mean_range_value2(df_input, y_low, y_high, first(group))) %>%
  mutate(price = if_else(is.finite(price), price, NA))

breaks <- unique(as.integer(df_input$cut))
labels <- unique(df_input$cut)

ggplot() +
  geom_segment(data = vl_fill, aes(x = xnew, xend = xend, y = y, yend = y,
                                   color = price)) +
  geom_violin(data = df_input, aes(x = as.integer(cut), y = carat, fill = cut),
              color = "grey", alpha = 0, draw_quantiles = c(0.25, 0.5, 0.75),
              show.legend = FALSE) +
  scale_x_continuous(breaks = breaks, labels = labels) +
  scale_color_viridis_c() +
  labs(x = "Cut", y = "Carat", color = "Price") +
  theme(legend.position = "bottom")

Violin plot using a separate column in the data frame for the fill gradient.

Upvotes: 1

tjebo
tjebo

Reputation: 23807

Just answered this for another thread, but believe it's possibly more appropriate for this thread. You can create a pseudo-fill by drawing many segments. You can get those directly from the underlying data in the ggplot_built object.

If you want an additional polygon outline ("border"), you'd need to create this from the x/y coordinates. Below one option.

library(tidyverse)

p <- ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin() 

mywidth <- .35 # bit of trial and error

# all you need for the gradient fill 
vl_fill <- data.frame(ggplot_build(p)$data) %>%
  mutate(xnew = x- mywidth*violinwidth, xend = x+ mywidth*violinwidth) 

# the outline is a bit more convoluted, as the order matters
vl_poly <- vl_fill %>%
  select(xnew, xend, y, group) %>%
  pivot_longer(-c(y, group), names_to = "oldx", values_to = "x") %>% 
  arrange(y) %>%
  split(., .$oldx) %>%
  map(., function(x) {
    if(all(x$oldx == "xnew")) x <- arrange(x, desc(y))
    x
    }) %>%
  bind_rows()

ggplot() +
  geom_polygon(data = vl_poly, aes(x, y, group = group), 
               color= "black", size = 1, fill = NA) +  
  geom_segment(data = vl_fill, aes(x = xnew, xend = xend, y = y, yend = y,
                                   color = violinwidth))  

Created on 2021-04-14 by the reprex package (v1.0.0)

Upvotes: 3

Derek Corcoran
Derek Corcoran

Reputation: 4102

to change the colour of the violin plot you use fill = variable, like this:

ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin(aes(fill=cut)) 

same goes for boxplot

ggplot(diamonds, aes(x=cut,y=carat)) + geom_boxplot(aes(fill=cut)) 

enter image description here

but whatever value you have has to have the same value for each cut, that is, if you wanted to use for example mean depth/cut as the color variable you would have to code it.

with dplyr group your diamonds by cut and with summarize get the mean depth (or any other variable)

library(dplyr)
diamonds_group <- group_by(diamonds, cut)
diamonds_group <- summarize(diamonds_group, Mean_Price = mean(price))

Then I used diamonds2 as a copy of diamonds to then manipulate the dataset

diamonds2 <- diamonds

I merge both dataframes to get the Mean_Depth as a variable in diamonds2

diamonds2 <- merge(diamonds2, diamonds_group)

And now I can plot it with mean depth as a color variable

ggplot(diamonds2, aes(x=cut,y=carat)) + geom_boxplot(aes(fill=Mean_Price)) + scale_fill_gradient2(midpoint = mean(diamonds2$price))

enter image description here

Upvotes: 2

Related Questions