Reputation: 33
I am trying to plot a subset of lines for my dataset, but I can't seem to figure out how to get the legend to display properly, either normally or using a melt. The dataset has the following structure (more forecasts and dates in actual dataset, this is just an example):
Date Actual Fcst1 Fcst2 Fcst3 Fcst4
2015-01-01 500 600 700 400 450
2015-02-01 600 610 630 480 600
2015-03-01 700 234 875 754 733
.......... ... ... ... ... ...
I am currently using this code:
ggplot(df, aes(x = Date)) +
geom_line(aes(y = Fcst1), color = "red", size = 1) +
geom_line(aes(y = Fcst2),
color = "blue",
size = 1
) +
geom_line(aes(y = Fcst3),
color = "green",
size = 1
) +
geom_line(aes(y = Fcst4),
color = "yellow",
size = 1
) +
geom_line(aes(y = Fcst5),
color = "purple",
size = 1
) +
geom_line(aes(y = Fcst6), color = "orange", size = 1) +
geom_line(aes(y = Actual), color = "black", size = 1.2) +
ggtitle(label = "Actuals vs 2015 Forecasts", subtitle = fname) +
ylab("Balance") +
scale_y_continuous(labels = comma)
I can't get the legend to display properly no matter what, even when I try using a melt. Can someone help me please?
Upvotes: 2
Views: 204
Reputation: 160437
ggplot2
prefers things in long format, and tends to "punish" (make hard) doing things like you're doing now. Let's reshape (I'll use tidyr::pivot_longer
, others work just as well).
library(ggplot2)
ggplot(tidyr::pivot_longer(df, Fcst1:Fcst4),
aes(Date, value, color = name)) +
geom_line()
As you can tell, using color=
within an aes
thetic varies the colors accordingly. If you want to control the colors, there are many themes available (e.g., viridis
and many with color-blind profiles), but doing it manually is done with scale_color_manual
, I'll demo below. Finally, I'll tweak the names and such a little.
ggplot(tidyr::pivot_longer(df, Actual:Fcst4, names_to = "Forecast", names_prefix = "Fcst"),
aes(Date, value, color = Forecast)) +
geom_line(size = 1) +
scale_color_manual(values = c("Actual" = "black", "1" = "red", "2" = "blue",
"3" = "green", "4" = "yellow", "5" = "purple",
"6" = "orange")) +
ggtitle(label = "Actuals vs 2015 Forecasts", subtitle = "(unk filename)") +
ylab("Balance") +
scale_y_continuous(labels = scales::comma)
The manual colors don't have to be a perfect match, as you can see with 5
defined but not used (based on your data sample). Missing colors in the values=
named vector will be removed from the plot (with a warning).
Finally, a common question is ordering the components in the legend. This can be done with factor
s:
df_long <- tidyr::pivot_longer(df, Actual:Fcst4, names_to = "Forecast", names_prefix = "Fcst")
df_long$Forecast <- relevel(factor(df_long$Forecast), "Actual")
ggplot(df_long, aes(Date, value, color = Forecast)) +
geom_line(size = 1) +
scale_color_manual(values = c("Actual" = "black", "1" = "red", "2" = "blue",
"3" = "green", "4" = "yellow", "5" = "purple",
"6" = "orange")) +
ggtitle(label = "Actuals vs 2015 Forecasts", subtitle = "(unk filename)") +
ylab("Balance") +
scale_y_continuous(labels = scales::comma)
I used stats::relevel
to move one factor "to the front", otherwise it tends to be alphabetic (as shown in the second graphic above). There are many tools for working with factors, the forcats
package is a popular one (esp among tidyverse users).
This processing could easily have been handled within a dplyr
-pipe.
Since you mentioned plotting batches of forecasts at a time, here are a couple of approaches. I'll augment the data by copying the Fcst
columns into another set of 4:
df <- cbind(df, setNames(df[,3:6], paste0("Fcst", 5:8)))
df_long <- tidyr::pivot_longer(df, Actual:Fcst8, names_to = "Forecast", names_prefix = "Fcst")
df_long$Forecast <- relevel(factor(df_long$Forecast), "Actual")
I'll "simplify" the plot for code brevity, though the theming will still work as above.
Individual plots, filter one at a time and plot it.
ggplot(df_long[df_long$Forecast %in% c("Actual", "1", "3", "5", "7"),],
aes(Date, value, color = Forecast)) +
geom_line(size = 1)
Faceting. I'll show a brute-force way to do this for this example, then a more flexible (perhaps) way. I'm using dplyr
here because it makes several of the operations much easier to see and understand (once you get used to the dplyr-esque syntax). (I often find keeping the control line, "Actual", a different color/thickness than the others help solidify comparisons across the facets. Over to you.)
library(dplyr)
df_rest <- df_long %>%
filter(! Forecast == "Actual") %>%
mutate(grp = cut(as.integer(as.character(Forecast)), c(0, 5, 9), labels = FALSE))
df_combined <- df_long %>%
filter(Forecast == "Actual") %>%
select(-grp) %>%
crossing(., unique(select(df_rest, grp))) %>%
bind_rows(df_rest)
ggplot(df_combined, aes(Date, value, color = Forecast)) +
geom_line(size = 1) +
facet_grid(grp ~ .)
Faceting, but with a more maintainable set of facets. I'll use a simple data.frame
to control which lines are included in which $grp
. This makes it much easier (imo) to "cherry pick" specific lines for specific facets.
grps <- tibble::tribble(
~grp, ~Forecast
,1, "Actual"
,1, "1"
,1, "3"
,1, "5"
,2, "Actual"
,2, "2"
,2, "4"
,2, "6"
,2, "7"
,2, "8"
)
ggplot(left_join(df_long, grps, by = "Forecast"),
aes(Date, value, color = Forecast)) +
geom_line(size = 1) +
facet_grid(grp ~ .)
In this case, I used tribble
solely to make it easier to see which goes together; any data.frame
will work. I also demonstrate that $grp
sizes do not need to be equal, include whatever you want.
Use the frame from #3 above for joining, then just filter on them, as in
left_join(df_long, grps, by = "Forecase") %>%
filter(grp == 1) %>%
ggplot(., aes(Date, value, color = Forecast)) +
geom_line(size = 1) +
facet_grid(grp ~ .)
Upvotes: 5