Reputation: 221
I'm trying to make separate color gradients for grouped data that is displayed on the same scatterplot. I've included sample data below. User is unique user IDs, task is unique task IDs, days_completion is the time in days when the task was completed, task_group is the group indicator that the tasks are grouped into, and task_order is the order in which the tasks were made available for users to complete. Each row represents the time that the user completed a specific task. The task_order may not logically follow this organization as it was randomly generated, but it should suffice for demonstration.
The resulting plot would have days_completion of the x axis, user on the y axis, each point from geom_point would represent the time in days that the user completed their task. The tasks groups would each have their own color in a gradient of dark to light by task_order. For example, task group 1 would be dark red at task order == 1 and light red at task order == 7.
Sample code is below:
library(dplyr)
library(forcats)
library(ggplot2)
test_data <- tibble(user = rep(seq(1:50), 10) %>%
as_factor(),
task = sample(1:10, 500, replace = TRUE) %>%
as_factor(),
days_completion = sample(1:500, 500, replace = FALSE),
task_group = sample(1:3, 500, replace = TRUE) %>%
as_factor(),
task_order = sample(1:7, 500, replace = TRUE, prob = c(rep(.25,3),.2,.2,.1,.1)) %>%
as_factor()) %>%
arrange(days_completion)
#Sample plotting approach; does not work
test_plot <- test_data %>%
ggplot(aes(x = days_completion, y = user, color = task)) +
geom_point() +
#This seems to be what I need, but I can't figure out how to specify multiple gradients by task_group
scale_color_gradient()
I know I could manually order the factors and map colors with hex codes, but I'd like something that can scale and avoid the manual process. Also, if anyone has any suggestions for how to display this plot other than a scatterplot, I'm open to suggestions. The main idea is to detect patterns in completion time in trends displayed by the color. The trends may not show due to it being randomly generated data, but that's okay.
Upvotes: 5
Views: 3026
Reputation: 221
My coworker found a solution in another post that requires an additional package called ggnewscale. I still don't know if this can be done only with ggplot2, but this works. I'm still open to alternative plotting suggestions though. The purpose is to detect any trends in day of completion across and within users. Across users is where I expect to see more of a trend, but within could be informative too.
How merge two different scale color gradient with ggplot
library(ggnewscale)
dat1 <- test_data %>% filter(task_group == 1)
dat2 <- test_data %>% filter(task_group == 2)
dat3 <- test_data %>% filter(task_group == 3)
ggplot(mapping = aes(x = days_completion, y = user)) +
geom_point(data = dat1, aes(color = task_order)) +
scale_color_gradientn(colors = c('#99000d', '#fee5d9')) +
new_scale_color() +
geom_point(data = dat2, aes(color = task_order)) +
scale_color_gradientn(colors = c('#084594', '#4292c6')) +
new_scale_color() +
geom_point(data = dat3, aes(color = task_order)) +
scale_color_gradientn(colors = c('#238b45'))
Upvotes: 4
Reputation: 16178
You can have generate your own color scale by using RColorBrewer
and pass it to scale_color_manual
:
library(RColorBrewer)
colo <- colorRampPalette(c("darkred", "orangered"))(10)
library(ggplot2)
ggplot(test_data, aes(x = days_completion, y = user))+
geom_point(aes(color = task))+
scale_color_manual(values = colo)
Regarding the representation other than scatterplot, it is difficult to propose something else. It will based on your original data and the question you are trying to solve. Do you need to see the pattern per user ? or does your 50 users are just replicate of your experiments. In those cases, maybe some geom_density
could be helpful. Otherwise, maybe you can take a look at stat_contour
function.
Upvotes: 0