Reputation: 1875
I have some fake data representing the answering times of different users answering an online survey. The dataset has three variables: the id of the respondent (user), the name of the question (question) and the answering time for each question (time).
n <- 1000
dat <- data.frame(user = 1:n,
question = sample(paste("q", 1:4, sep = ""), size = n, replace = TRUE),
time = round(rnorm(n, mean = 10, sd=4), 0)
)
pltSingleRespondent <- function(df, highlightUsers){
dat %>%
ggplot(aes(x = question, y = time)) +
geom_boxplot(fill = 'orange') + coord_flip() +
ggtitle("Answering time per question")
}
pltSingleRespondent(dat, c(1, 31) )
I was creating a function that plots a boxplot with the answering times for each question. However, now I'd like to overlay that plot with the answering times of specific respondents (highlightUsers). The following image shows an example:
Can someone please explain me how to do this?
Upvotes: 2
Views: 791
Reputation: 33782
Slightly different approach. Add a column to the data that indicates the highlighted users and map that variable to geom_line
. Use scale_color_discrete(na.translate = FALSE)
to color only the non-NA values.
library(dplyr)
library(ggplot2)
pltSingleRespondent <- function(df, highlightUsers) {
df %>%
mutate(User = factor(ifelse(user %in% highlightUsers, user, NA))) %>%
ggplot(aes(question, time)) +
geom_boxplot(fill = "orange") +
geom_line(aes(color = User, group = User)) +
ggtitle("Answering time per question") +
scale_color_discrete(na.translate = FALSE) +
coord_flip() +
theme_bw()
}
Using the example data from @r2evans
pltSingleRespondent(dat, c(1, 34))
Upvotes: 1
Reputation: 160577
I think the most direct way to do this is to subset your data within a call to geom_line
.
I'll start with a different set of random data, since the sample data in the question does not include all questions for a user.
set.seed(2021)
dat <- expand.grid(user = factor(1:50), question = paste0("q", 1:4))
dat$time <- round(rnorm(200, mean = 10, sd = 4), 0)
dat %>%
ggplot(aes(x = question, y = time)) +
geom_boxplot(fill = 'orange') + coord_flip() +
ggtitle("Answering time per question") +
geom_line(aes(color = user, group = user), size = 2,
data = ~ subset(., user %in% c(1L, 34L)))
You can functionize it however you want. If you're using dplyr
, you can use dplyr::filter
instead of subset
with no other change.
Also, I chose to factor(user)
, since otherwise ggplot2
tends to think its data is continuous (for color=user
). You can choose to use or not use this, though you may need more wrangling to get it to be discrete.
Upvotes: 4