J.Sabree
J.Sabree

Reputation: 2536

How to specify ggplot legend order when you have multiple variables that are not all part of one column?

I'm plotting the same data by different time scales (Week, Month, Quarter, etc.) using ggplot, and as a result, I'm pulling the data from different columns. However, when I see my legend, I want it to be a specific order.

I know if all the grouping variables were in one column, I could set it as an ordered factor, as it explained here, but my data are spread across multiple columns. I also tried the suggestions here about re-ordering multiple geoms, but it didn't work.

Because my actual dataset is very complex, I've reproduced a smaller version that just has week and month data. For the final answer, please allow it to specify a specific order, not just something like rev(), because in my actual dataset, I have 6 columns that need a specific order.

Here's a code to reproduce--for this, the first 3 chunks make the dataset, so only the 4th chunk to make the plot should be relevant for the actual solution. The default that R shows the order is by showing 'Score - Month' first in the legend, so I'd like to see how I could make this the 2nd.

library(dplyr)
library(ggplot2)
library(lubridate)

#Generates week data -- shouldn't be relevant to troubleshoot
by_week <- tibble(Week = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="weeks"),
                Week_score = c(sample(100:200, 79)),
                Month = ymd(format(Week, "%Y-%m-01")))

#Generates month data -- shouldn't be relevant to troubleshoot                
by_month <- tibble(Month = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="months"),
                   Month_score = c(sample(150:200, 19)))

#Joins data and removes duplications of month data for easier plotting -- shouldn't be relevant to troubleshoot  
all_time <- by_week %>%
  full_join(by_month) %>%
  mutate(helper = across(c(contains("Month")), ~paste(.))) %>% 
  mutate(across(c(contains("Month")), ~ifelse(duplicated(helper), NA, .)), .keep="unused") %>%
  mutate(Month = as.Date(Month))

#Makes plot - this is where I want the order in the legend to be different
all_time %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y= Week_score, colour = "Week_score")) +
  geom_line(data=all_time[!is.na(all_time$Month_score),], aes(y = Month_score, colour = "Month_score")) + #This line tells R just to focus on non-missing values for Month_score
  scale_colour_discrete(labels = c("Week_score" = "Score - Week", "Month_score" = "Score - Month"))

Here's what the current legend looks like--I want the order switched with a solution that is scalable to more than 2 options. Thank you!

enter image description here

Upvotes: 0

Views: 1635

Answers (1)

Quinten
Quinten

Reputation: 41235

As @stefan mentioned right in the comments, you should set the names of your labels in the limits option of scale_colour_discrete. You can add more columns by yourself. You can use the following code:

library(dplyr)
library(ggplot2)
library(lubridate)

#Generates week data -- shouldn't be relevant to troubleshoot
by_week <- tibble(Week = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="weeks"),
                  Week_score = c(sample(100:200, 79)),
                  Month = ymd(format(Week, "%Y-%m-01")))

#Generates month data -- shouldn't be relevant to troubleshoot                
by_month <- tibble(Month = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="months"),
                   Month_score = c(sample(150:200, 19)))

#Joins data and removes duplications of month data for easier plotting -- shouldn't be relevant to troubleshoot  
all_time <- by_week %>%
  full_join(by_month) %>%
  mutate(helper = across(c(contains("Month")), ~paste(.))) %>% 
  mutate(across(c(contains("Month")), ~ifelse(duplicated(helper), NA, .)), .keep="unused") %>%
  mutate(Month = as.Date(Month))

#Makes plot - this is where I want the order in the legend to be different
all_time %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y= Week_score, colour = "Week_score")) +
  geom_line(data=all_time[!is.na(all_time$Month_score),], aes(y = Month_score, colour = "Month_score")) + #This line tells R just to focus on non-missing values for Month_score
  scale_colour_discrete(labels = c("Week_score" = "Score - Week", "Month_score" = "Score - Month"), limits = c("Week_score", "Month_score"))

Output:

enter image description here

As you can see the order of the labels is changed.

Upvotes: 1

Related Questions