Reputation: 1367
I'm having trouble combining color and linetype guides into a single legend in a plot produced with ggplot2. Either the linetype shows up with all of the linetypes keyed the same way, or it does not show up at all.
My plot includes both a ribbon to show the bulk of the observations, along with lines showing minimum, median, maximum, and sometimes the observations from a single year.
Example code using built in CO2 data set:
library(tidyverse)
myExample <- CO2 %>%
group_by(conc) %>%
summarise(d.min = min(uptake, na.rm= TRUE),
d.ten = quantile(uptake,probs = .1, na.rm = TRUE),
d.median = median(uptake, na.rm = TRUE),
d.ninty = quantile(uptake, probs = .9, na.rm= TRUE),
d.max = max(uptake, na.rm = TRUE))
myExample <- cbind(myExample, "Qn1"= filter(CO2, Plant == "Qn1")[,5])
plot_plant <- TRUE # Switch to plot single observation series
myExample %>%
ggplot(aes(x=conc))+
geom_ribbon(aes(ymin=d.ten, ymax= d.ninty, fill = "80% of observations"), alpha = .2)+
geom_line(aes(y=d.min, colour = "c"), linetype = 3, size = .5)+
geom_line(aes(y=d.median, colour = "e"),linetype = 2, size = .5)+
geom_line(aes(y=d.max, colour = "a"),linetype = 3, size = .5)+
{if(plot_plant)geom_line(aes(y=Qn1, color = "f"), linetype = 1,size =.5)}+
scale_fill_manual("Statistic", values = "blue")+
scale_color_brewer(palette = "Dark2",name = "",
labels = c(
a= "Maximum",
e= "Median",
c= "Minimum",
f = current_year
), breaks = c("a","e","c","f"))+
scale_linetype_manual(name = "")+
guides(fill= guide_legend(order = 1), color = guide_legend(order = 2), linetype = guide_legend(order = 2))
With plot_plant set to TRUE, the code plots a single observation series, but linetype does not show up at all in the legend:
With plot_plant set to FALSE, linetype shows up in the legend, but I cannot see the distinction between the dotted and dashed legend entries:
The plot is working as desired, but I would like the linetype distinctions to show up in the legend. Visually, it is more important when I'm plotting the single observation series because the distinction between solid and dashed or dotted is stronger.
Searching for answers, I've seen suggestions to combine the different stats(min, median, max, and the single series) into a single variable and let ggplot determine the linetypes (ex [this post]ggplot2 manually specifying color & linetype - duplicate legend) or make a hash that describes the linetype [for example]How to rename a (combined) legend in ggplot2? but neither of these approaches seems to play well in combination with the ribbon plot.
I tried formatting my data into a long format, which usually works well for ggplot. This worked if I plotted all of the statistics as line geometry, but couldn't get the ribbon to work like I wanted, and overlaying a single observation series seemed like it needed to be stored in a different data table.
Upvotes: 1
Views: 1421
Reputation: 8686
As you noted, ggplot
loves long format data. So I recommend sticking with that.
Here I generate some made up data:
library(tibble)
library(dplyr)
library(ggplot2)
library(tidyr)
set.seed(42)
tibble(x = rep(1:10, each = 10),
y = unlist(lapply(1:10, function(x) rnorm(10, x)))) -> tbl_long
which looks like this:
# A tibble: 100 x 2
x y
<int> <dbl>
1 1 2.37
2 1 0.435
3 1 1.36
4 1 1.63
5 1 1.40
6 1 0.894
7 1 2.51
8 1 0.905
9 1 3.02
10 1 0.937
# ... with 90 more rows
Then I group_by(x)
and calculate quantiles of interest for y in each group:
tbl_long %>%
group_by(x) %>%
mutate(q_0.0 = quantile(y, probs = 0.0),
q_0.1 = quantile(y, probs = 0.1),
q_0.5 = quantile(y, probs = 0.5),
q_0.9 = quantile(y, probs = 0.9),
q_1.0 = quantile(y, probs = 1.0)) -> tbl_long_and_wide
and that looks like:
# A tibble: 100 x 7
# Groups: x [10]
x y q_0.0 q_0.1 q_0.5 q_0.9 q_1.0
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2.37 0.435 0.848 1.38 2.56 3.02
2 1 0.435 0.435 0.848 1.38 2.56 3.02
3 1 1.36 0.435 0.848 1.38 2.56 3.02
4 1 1.63 0.435 0.848 1.38 2.56 3.02
5 1 1.40 0.435 0.848 1.38 2.56 3.02
6 1 0.894 0.435 0.848 1.38 2.56 3.02
7 1 2.51 0.435 0.848 1.38 2.56 3.02
8 1 0.905 0.435 0.848 1.38 2.56 3.02
9 1 3.02 0.435 0.848 1.38 2.56 3.02
10 1 0.937 0.435 0.848 1.38 2.56 3.02
# ... with 90 more rows
Then I gather up all the columns except for x, y, and the 10- and 90-percentile variables into two variables: key and value. The new key variable takes on the names of the old variables from which each value came from. The other variables are just copied down as needed.
tbl_long_and_wide %>%
gather(key, value, -x, -y, -q_0.1, -q_0.9) -> tbl_super_long
and that looks like:
# A tibble: 300 x 6
# Groups: x [10]
x y q_0.1 q_0.9 key value
<int> <dbl> <dbl> <dbl> <chr> <dbl>
1 1 2.37 0.848 2.56 q_0.0 0.435
2 1 0.435 0.848 2.56 q_0.0 0.435
3 1 1.36 0.848 2.56 q_0.0 0.435
4 1 1.63 0.848 2.56 q_0.0 0.435
5 1 1.40 0.848 2.56 q_0.0 0.435
6 1 0.894 0.848 2.56 q_0.0 0.435
7 1 2.51 0.848 2.56 q_0.0 0.435
8 1 0.905 0.848 2.56 q_0.0 0.435
9 1 3.02 0.848 2.56 q_0.0 0.435
10 1 0.937 0.848 2.56 q_0.0 0.435
# ... with 290 more rows
This format will allow you to use both geom_ribbon()
and geom_smooth()
like you want to do because the variables for the lines are contained in value
and grouped by key
whereas the variables to be mapped to ymin
and ymax
are separate from value
and are all the same within each x group.
tbl_super_long %>%
ggplot() +
geom_ribbon(aes(x = x,
ymin = q_0.1,
ymax = q_0.9,
fill = "80% of observations"),
alpha = 0.2) +
geom_line(aes(x = x,
y = value,
color = key,
linetype = key)) +
scale_fill_manual(name = element_text("Statistic"),
guide = guide_legend(order = 1),
values = viridisLite::viridis(1)) +
scale_color_manual(name = element_blank(),
labels = c("Minimum", "Median", "Maximum"),
guide = guide_legend(reverse = TRUE, order = 2),
values = viridisLite::viridis(3)) +
scale_linetype_manual(name = element_blank(),
labels = c("Minimum", "Median", "Maximum"),
guide = guide_legend(reverse = TRUE, order = 2),
values = c("dotted", "dashed", "solid")) +
labs(x = "x", y = "y")
This data format with the long but grouped x and y variables plus the independent but repeated ymin, and xmin variables will allow you to use both geom_ribbon()
and geom_smooth()
and allow the linetypes
to show up properly in the legend.
Upvotes: 0