kmai2
kmai2

Reputation: 33

Missing legend for ggplot

I am trying to plot a subset of lines for my dataset, but I can't seem to figure out how to get the legend to display properly, either normally or using a melt. The dataset has the following structure (more forecasts and dates in actual dataset, this is just an example):

Date        Actual Fcst1 Fcst2 Fcst3 Fcst4
2015-01-01  500    600   700   400   450
2015-02-01  600    610   630   480   600
2015-03-01  700    234   875   754   733
..........  ...    ...   ...   ...   ...

I am currently using this code:

ggplot(df, aes(x = Date)) +
  geom_line(aes(y = Fcst1), color = "red", size = 1) +
  geom_line(aes(y = Fcst2),
    color = "blue",
    size = 1
  ) +
  geom_line(aes(y = Fcst3),
    color = "green",
    size = 1
  ) +
  geom_line(aes(y = Fcst4),
    color = "yellow",
    size = 1
  ) +
  geom_line(aes(y = Fcst5),
    color = "purple",
    size = 1
  ) +
  geom_line(aes(y = Fcst6), color = "orange", size = 1) +
  geom_line(aes(y = Actual), color = "black", size = 1.2) +
  ggtitle(label = "Actuals vs 2015 Forecasts", subtitle = fname) +
  ylab("Balance") +
  scale_y_continuous(labels = comma)

I can't get the legend to display properly no matter what, even when I try using a melt. Can someone help me please?

Upvotes: 2

Views: 204

Answers (1)

r2evans
r2evans

Reputation: 160437

ggplot2 prefers things in long format, and tends to "punish" (make hard) doing things like you're doing now. Let's reshape (I'll use tidyr::pivot_longer, others work just as well).

library(ggplot2)
ggplot(tidyr::pivot_longer(df, Fcst1:Fcst4),
       aes(Date, value, color = name)) +
  geom_line()

basic ggplot2

As you can tell, using color= within an aesthetic varies the colors accordingly. If you want to control the colors, there are many themes available (e.g., viridis and many with color-blind profiles), but doing it manually is done with scale_color_manual, I'll demo below. Finally, I'll tweak the names and such a little.

ggplot(tidyr::pivot_longer(df, Actual:Fcst4, names_to = "Forecast", names_prefix = "Fcst"),
       aes(Date, value, color = Forecast)) +
  geom_line(size = 1) +
  scale_color_manual(values = c("Actual" = "black", "1" = "red", "2" = "blue",
                                "3" = "green", "4" = "yellow", "5" = "purple",
                                "6" = "orange")) +
  ggtitle(label = "Actuals vs 2015 Forecasts", subtitle = "(unk filename)") +
  ylab("Balance") +
  scale_y_continuous(labels = scales::comma)

The manual colors don't have to be a perfect match, as you can see with 5 defined but not used (based on your data sample). Missing colors in the values= named vector will be removed from the plot (with a warning).

same ggplot2, updated theme

Finally, a common question is ordering the components in the legend. This can be done with factors:

df_long <- tidyr::pivot_longer(df, Actual:Fcst4, names_to = "Forecast", names_prefix = "Fcst")
df_long$Forecast <- relevel(factor(df_long$Forecast), "Actual")
ggplot(df_long, aes(Date, value, color = Forecast)) +
  geom_line(size = 1) +
  scale_color_manual(values = c("Actual" = "black", "1" = "red", "2" = "blue",
                                "3" = "green", "4" = "yellow", "5" = "purple",
                                "6" = "orange")) +
  ggtitle(label = "Actuals vs 2015 Forecasts", subtitle = "(unk filename)") +
  ylab("Balance") +
  scale_y_continuous(labels = scales::comma)

same ggplot2, reordered legend

I used stats::relevel to move one factor "to the front", otherwise it tends to be alphabetic (as shown in the second graphic above). There are many tools for working with factors, the forcats package is a popular one (esp among tidyverse users).

This processing could easily have been handled within a dplyr-pipe.


Since you mentioned plotting batches of forecasts at a time, here are a couple of approaches. I'll augment the data by copying the Fcst columns into another set of 4:

df <- cbind(df, setNames(df[,3:6], paste0("Fcst", 5:8)))
df_long <- tidyr::pivot_longer(df, Actual:Fcst8, names_to = "Forecast", names_prefix = "Fcst")
df_long$Forecast <- relevel(factor(df_long$Forecast), "Actual")

I'll "simplify" the plot for code brevity, though the theming will still work as above.

  1. Individual plots, filter one at a time and plot it.

    ggplot(df_long[df_long$Forecast %in% c("Actual", "1", "3", "5", "7"),],
           aes(Date, value, color = Forecast)) +
      geom_line(size = 1)
    
  2. Faceting. I'll show a brute-force way to do this for this example, then a more flexible (perhaps) way. I'm using dplyr here because it makes several of the operations much easier to see and understand (once you get used to the dplyr-esque syntax). (I often find keeping the control line, "Actual", a different color/thickness than the others help solidify comparisons across the facets. Over to you.)

    library(dplyr)
    df_rest <- df_long %>%
      filter(! Forecast == "Actual") %>%
      mutate(grp = cut(as.integer(as.character(Forecast)), c(0, 5, 9), labels = FALSE))
    
    df_combined <- df_long %>%
      filter(Forecast == "Actual") %>%
      select(-grp) %>%
      crossing(., unique(select(df_rest, grp))) %>%
      bind_rows(df_rest)
    
    ggplot(df_combined, aes(Date, value, color = Forecast)) +
      geom_line(size = 1) +
      facet_grid(grp ~ .)
    

    expanded data, ggplot2 faceted

  3. Faceting, but with a more maintainable set of facets. I'll use a simple data.frame to control which lines are included in which $grp. This makes it much easier (imo) to "cherry pick" specific lines for specific facets.

    grps <- tibble::tribble(
      ~grp, ~Forecast
      ,1, "Actual"
      ,1, "1"
      ,1, "3"
      ,1, "5"
      ,2, "Actual"
      ,2, "2"
      ,2, "4"
      ,2, "6"
      ,2, "7"
      ,2, "8"
    )
    ggplot(left_join(df_long, grps, by = "Forecast"),
           aes(Date, value, color = Forecast)) +
      geom_line(size = 1) +
      facet_grid(grp ~ .)
    

    In this case, I used tribble solely to make it easier to see which goes together; any data.frame will work. I also demonstrate that $grp sizes do not need to be equal, include whatever you want.

  4. Use the frame from #3 above for joining, then just filter on them, as in

    left_join(df_long, grps, by = "Forecase") %>%
      filter(grp == 1) %>%
      ggplot(., aes(Date, value, color = Forecast)) +
      geom_line(size = 1) +
      facet_grid(grp ~ .)
    

Upvotes: 5

Related Questions