sunshinegirl
sunshinegirl

Reputation: 81

Creating a custom legend in r

I'm hoping to have a legend that includes references to all colours, not just the vertical lines, and does not include a title.

I've tried scale_colour_manual and scale_fill_manual and they all either overlap or only show the vertical lines. I would appreciate any suggestions.

Reprex is below, including the custom colour palette.

var1 <- c(head(randu$x,n=12))
var2 <- as.Date(c("2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01"))
var3 <- c(tail(randu[which(randu$x + randu$y < 1),]$x,n=12))
var4 <- c(tail(randu[which(randu$x + randu$y < 1),]$y,n=12))

dat <- data.frame(var1,var2,var3,var4)
setDT(dat)
dat$var5 <- dat[,(var3+var4)]

new_dates <- as.Date(c("2010-09-01","2010-05-01"))

cbp2 <- c("#000000", "#56B4E9", "#009E73", "#0072B2", "#D55E00", "#CC79A7")

ggplot()+
  geom_bar(data=dat,colour=cbp2[1],fill = cbp2[1],aes(x=var2,y=var5,colour="var4"),stat="identity")+
  geom_bar(data=dat,colour=cbp2[2],fill = cbp2[2],aes(x=var2,y=var3,colour="var3"),stat="identity")+
  geom_line(data=dat,colour=cbp2[1],aes(x=var2,y=var1))+
  geom_vline(data=data.frame(xintercept = new_dates),
             aes(xintercept = new_dates,linetype = "Changes", colour="red"),
             linetype="dashed",key_glyph = "path")+
  scale_color_manual(name = "",
                     values = c("red",cbp2[2],cbp2[1]), 
                     breaks = c("red",cbp2[2],cbp2[1]),
                     labels = c("Changes","Var3","Var4"))+
  scale_fill_manual(name = "",
                    values = c(cbp2[2],cbp2[1]), 
                    breaks = c(cbp2[2],cbp2[1]),
                    labels = c("var3","var4"))+
  ylab("")+
  xlab("")+
  scale_x_date(expand=c(0,0),date_breaks = "3 month", date_labels =  "%b %y") + 
  scale_y_continuous(labels = function(var5) paste0(var5*100, "%"), 
                     limits=c(0,1),
                     breaks=c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1)) +
  theme(panel.background = element_blank(),
        axis.line = element_line(colour = "#000000"),
        axis.text.x = element_text(angle=60, hjust=1),  
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.title.x= (element_text(margin = unit(c(3, 0, 0, 0), "mm"))),
        legend.position = "top")

enter image description here

Upvotes: 0

Views: 387

Answers (1)

chemdork123
chemdork123

Reputation: 13793

There's quite a lot to unpack here with this one, but I gave it my best shot.

First of all, consider what you are trying to plot here. Normally, it's not a problem to call things var1, var2, var3,...; however, in this context it's really quite confusing. Consequently, for this solution, I will be re-posting your entire code reworked instead of just the plotting portion for reasons I hope to outline in this answer.

The Data and the Question

With all that being said, here is my understanding about the nature of the dataset and your desire for the final plot:

  • var2 in the dataset contains Date class information, and this is the common x axis for the entire plot.

  • var1 contains values that are to be used for the y values of the geom_line plot layer

  • var3 and var4 contain values that are to be used for creation of the stacked barplot which should make up the background of the plot

  • var5 is a sum of var3 + var4, and was a device to create the plot. Herein, it will not be useful, given the data analysis we are to do on the dataset and the application of Tidy Data principles.

  • xintercept Values for the geom_vline plot layer are supplied as the two dates new_dates

The OP's question indicates a need for the Legend to be displayed correctly. In this case, we want to indicate:

  • fill color of the bars as var3 and var4
  • the nature of the vertical lines as dashed red lines.. called "Changes"
  • A label for the geom_line plot layer. Assume the label will be var1.

Hope all that was correct!

Synthesizing the Dataset

I encourage the OP to consult use of Tidy Data Principles, which will make synthesis of data such as this much more straightforward in the future. Herein, I will apply these principles to the dataset dat.

First of all, let's handle the bar layer data. Applying Tidy Data principles, we would want to gather together var3 and var4 and create out of them two columns: (1) one for the name of the variable ("var3" or "var4"), and (2) one for the value. We will be telling ggplot2 to "stack" bars, so var5 is not needed here: ggplot2 will do that calculation automatically. To gather the columns together, my preference is always to use gather() from dplyr and tidyr:

library(dplyr)
library(tidyr)
library(ggplot2)
library(data.table)

var1 <- c(head(randu$x,n=12))
var2 <- as.Date(c("2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01"))
var3 <- c(tail(randu[which(randu$x + randu$y < 1),]$x,n=12))
var4 <- c(tail(randu[which(randu$x + randu$y < 1),]$y,n=12))

dat <- data.frame(var1,var2,var3,var4)
setDT(dat)
# dat$var5 <- dat[,(var3+var4)]   no longer needed
new_dates <- as.Date(c("2010-09-01","2010-05-01"))
cbp2 <- c("#000000", "#56B4E9", "#009E73", "#0072B2", "#D55E00", "#CC79A7")

newdat <- dat %>% 
  gather(key='var_name', value='value', -var2) # gather all columns except for var2

names(newdat) <- c('Dates', 'var_name', 'value')
newdat$var_name <- factor(newdat$var_name, levels=c('var4', 'var3','var1'))

In addition to gathering together, you will also note that I'm adjusting the names of the columns to make them a bit more easier to follow when it comes down to plotting. Additionally, I'm setting the order of the levels for newdat$var_name. The purpose here is that the order we specify will relate to the ordering used to create the plot. I want var3 to appear as a bar "under" var4, so we need to specify that var4 is first.

You could also create a separate dataset containing var2 and var1 to use for plotting the geom_line layer... but this also works fine.

The Plot

For the plot, I've tried to organize the code into separate sections. What OP was trying to do was to plot column-by-column, rather than using aes(fill= and aes(color= to set and create legends. In addition, the OP's original code had numerous examples of the following:

geom_*(aes(color=...), color=...)

The result of this in ggplot2 is that if you set an aesthetic value (like color=) outside of aes() while also stating this argument inside aes(), the value on the outside will overwrite the value specified inside the mapping--effectively removing any call to place that within a legend. This was the biggest cause for issue in the OP's example, and why certain items were the "right" color, but did not appear in any legend.

Specifying arguments in aes() only indicates that a legend should be created and tells ggplot2 on what basis to apply color, fill, linetype... it does not actually specify the color. Color should be specified using the scale_*_*() functions. In this case, we have 3 legend types created. The OP can organize however they wish to do so, but I tried to keep this example a bit illustrative to allow for some changing on the OP's case, since it is still not entirely clear how the legend is wanted to look completely.

Note that values= is used to apply the color, linetype, or fill aesthetic, and is done by feeding that argument a named vector. You can also use a non-named vector, in which case the attributes will be applied according to the ordering of the levels for that factor.

Note that I changed the line color of the geom_line to blue... just so that it stands out a bit. It would be a bit confusing otherwise, since there is a fill color that is also black.

ggplot(dat, aes(x=Dates, y=value)) +
  
  # plot layers
  geom_col(
    data=subset(newdat, var_name != 'var1'),
    aes(fill=var_name), position='stack') +
  geom_line(
    data=subset(newdat, var_name == 'var1'),
    aes(color=var_name)
  ) +
  geom_vline(data=data.frame(xintercept = new_dates),
                         aes(xintercept = new_dates, linetype = "Changes"), colour="red",
                         key_glyph = "path")+
 
  # color and legend settings 
  scale_fill_manual(
    name="Fill",
    values=c('var3'=cbp2[2], 'var4'=cbp2[1])) +
  
  scale_color_manual(
    name='Color',
    values = 'blue') +
  
  scale_linetype_manual(
    name='Linetype',
    values=2) +

  # scale adjustment and theme stuff
  scale_y_continuous(labels = function(var5) paste0(var5*100, "%"),
                     limits=c(0,1),
                     breaks=c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1)) +
  
  theme(panel.background = element_blank(),
                axis.line = element_line(colour = "#000000"),
                axis.text.x = element_text(angle=60, hjust=1),
                panel.grid.major = element_blank(),
                panel.grid.minor = element_blank(),
                axis.title.x= (element_text(margin = unit(c(3, 0, 0, 0), "mm"))),
                legend.position = "top")

enter image description here

Upvotes: 2

Related Questions