Paryl
Paryl

Reputation: 221

R ggplot2 Specify separate color gradients by group

I'm trying to make separate color gradients for grouped data that is displayed on the same scatterplot. I've included sample data below. User is unique user IDs, task is unique task IDs, days_completion is the time in days when the task was completed, task_group is the group indicator that the tasks are grouped into, and task_order is the order in which the tasks were made available for users to complete. Each row represents the time that the user completed a specific task. The task_order may not logically follow this organization as it was randomly generated, but it should suffice for demonstration.

The resulting plot would have days_completion of the x axis, user on the y axis, each point from geom_point would represent the time in days that the user completed their task. The tasks groups would each have their own color in a gradient of dark to light by task_order. For example, task group 1 would be dark red at task order == 1 and light red at task order == 7.

Sample code is below:

library(dplyr)
library(forcats)
library(ggplot2)

test_data <- tibble(user = rep(seq(1:50), 10) %>% 
                      as_factor(),
                    task = sample(1:10, 500, replace = TRUE) %>% 
                      as_factor(),
                    days_completion = sample(1:500, 500, replace = FALSE),
                    task_group = sample(1:3, 500, replace = TRUE) %>% 
                      as_factor(),
                    task_order = sample(1:7, 500, replace = TRUE, prob = c(rep(.25,3),.2,.2,.1,.1)) %>% 
                      as_factor()) %>% 
  arrange(days_completion)

#Sample plotting approach; does not work
test_plot <- test_data %>% 
              ggplot(aes(x = days_completion, y = user, color = task)) +
              geom_point() +
              #This seems to be what I need, but I can't figure out how to specify multiple gradients by task_group
              scale_color_gradient()

I know I could manually order the factors and map colors with hex codes, but I'd like something that can scale and avoid the manual process. Also, if anyone has any suggestions for how to display this plot other than a scatterplot, I'm open to suggestions. The main idea is to detect patterns in completion time in trends displayed by the color. The trends may not show due to it being randomly generated data, but that's okay.

Upvotes: 5

Views: 3026

Answers (2)

Paryl
Paryl

Reputation: 221

My coworker found a solution in another post that requires an additional package called ggnewscale. I still don't know if this can be done only with ggplot2, but this works. I'm still open to alternative plotting suggestions though. The purpose is to detect any trends in day of completion across and within users. Across users is where I expect to see more of a trend, but within could be informative too.

How merge two different scale color gradient with ggplot

library(ggnewscale)

dat1 <- test_data %>% filter(task_group == 1)
dat2 <- test_data %>% filter(task_group == 2)
dat3 <- test_data %>% filter(task_group == 3)

ggplot(mapping = aes(x = days_completion, y = user)) +
  geom_point(data = dat1, aes(color = task_order)) +
  scale_color_gradientn(colors = c('#99000d', '#fee5d9')) +
  new_scale_color() +
  geom_point(data = dat2, aes(color = task_order)) +
  scale_color_gradientn(colors = c('#084594', '#4292c6')) +
  new_scale_color() +
  geom_point(data = dat3, aes(color = task_order)) +
  scale_color_gradientn(colors = c('#238b45'))

Example draft plot

Upvotes: 4

dc37
dc37

Reputation: 16178

You can have generate your own color scale by using RColorBrewer and pass it to scale_color_manual:

library(RColorBrewer)
colo <- colorRampPalette(c("darkred", "orangered"))(10)

library(ggplot2)
ggplot(test_data, aes(x = days_completion, y = user))+
  geom_point(aes(color = task))+
  scale_color_manual(values = colo)

enter image description here

Regarding the representation other than scatterplot, it is difficult to propose something else. It will based on your original data and the question you are trying to solve. Do you need to see the pattern per user ? or does your 50 users are just replicate of your experiments. In those cases, maybe some geom_density could be helpful. Otherwise, maybe you can take a look at stat_contour function.

Upvotes: 0

Related Questions