Bodhi Bose
Bodhi Bose

Reputation: 33

how to add legends from stat_summary and remove legends from the main plot?

I want to plot the values of df1 by two groups i.e. product and start_date and also plot a crossbar with the mean of df1(blue) and mean of df2(red) as in the attached diagram.

df1 <- data.frame(product = c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C","D","D","D","D","D","D","D"), 
                  start_date =as.Date(c('2020-02-01', '2020-02-02', '2020-02-03', '2020-02-04', '2020-02-05', '2020-02-06', '2020-02-07')),
                  value = c(15.71,17.37,19.93,14.28,15.85,10.5,8.58,5.62,5.19,5.44,4.6,7.04,6.29,3.3,20.35,27.92,23.07,12.83,22.28,21.32,31.46,34.82,23.68,29.11,14.48,25.2,16.91,27.79))

df2 <- data.frame(product = c("A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C","C","D","D","D","D","D","D"), 
                  start_date =as.Date(c('2019-07-09', '2019-07-10', '2019-07-11', '2019-07-12', '2019-07-13', '2019-07-14')),
                  value = c(9.06,10.74,14.64,7.67,8.72,11.21,4.76,4.53,3.81,4.32,3.95,5.2,20.36,21.17,19.51,16.25,17.93,16.94,14.51,14.65,23.28,10.84,16.71,12.48))

PLOT GRAPH

graph1 <- ggplot(df1, aes(
    y = value, x = product, fill = product, color = factor(start_date))) +
  geom_col(data = df1, stat = "identity",position = position_dodge(width = 0.8), width = 0.7, inherit.aes = TRUE, size = 0) + 
  xlab("Product") + ylab("Values")  + ylim(c(0,40)) + 
  scale_fill_manual(values=c("#008FCC", "#FFAA00", "#E60076", "#B00000")) +
  stat_summary(data = df1, aes(x = factor(product),y = value),fun = "mean",geom = "crossbar", color = "blue", size = 1, width = 0.8, inherit.aes = FALSE) +
  stat_summary(data = df2, aes(x = factor(product),y = value),fun = "mean",geom = "crossbar", color = "red", size = 1, width = 0.8, inherit.aes = FALSE) 

Is there any way to remove the borders of the bar plots and add legend of the two crossbars at the top right corner of the plot ? enter image description here

Additionally I would like to know if is there a way to add the just the "date" from df1 below each bar in the plot ?

Upvotes: 0

Views: 592

Answers (1)

chemdork123
chemdork123

Reputation: 13793

Your question about adjusting the plot has multiple parts. To summarize a few points:

  • Change from color=factor(start_date) to group= to remove the color around bars, but maintain the separation of individual bars by start_date

  • Use theme(legend.position=... and specify precise placement of legend within plot area. Use theme(legend.direction='horizontal') too when appropriate.

  • Add color= attribute into the stat_summary(geom='crossbar'...) calls in order to "add" them both to a legend, then use scale_color_manual to specify color if you don't like the default.

  • Minor suggestion: Use ylim(X,Y) instead of ylim(c(X,Y)). It's not necessary to put the limits into a vector, since ylim can accept that instead and it's simpler. Note that it still works either way, so that's why this point is minor.

  • You don't need the data=df1 for the first stat_summary call, since it's the default mapping based on the data= value set in ggplot(.... You still need the y= value though, since it is required.

Here's the adjusted code from implementing the notes above:

ggplot(df1, aes(y = value, x = product, fill = product, group = factor(start_date))) +
    geom_col(data = df1, position = position_dodge(width = 0.8),
        width = 0.7, inherit.aes = TRUE, size = 0) +
    xlab("Product") + ylab("Values") + ylim(0,60) +
    scale_fill_manual(values=c("#008FCC", "#FFAA00", "#E60076", "#B00000")) +
    stat_summary(aes(x = factor(product), y=value, color='mean1'),
        fun = "mean", geom = "crossbar",
        size = 1, width = 0.8, inherit.aes = FALSE) +
    stat_summary(data = df2, aes(x = factor(product),y=value, color='mean2'),
        fun = "mean", geom = "crossbar",
        size = 1, width = 0.8, inherit.aes = FALSE) +
    theme(legend.position=c(0.75,0.8), legend.direction = 'horizontal') +
    scale_color_manual(values=c('blue', 'red'))

enter image description here

Explanation: The point of changing to group=factor(start_date) is so that you maintain the splitting of bars among the different products--a concept known as "dodging". Since your original call to color= was in the aes(, it created a legend item and the geom_col used this for dodging, since the other aesthetics were already mapped to x and y, and the fill= aesthetic was being applied. If you remove color=, you get one bar for each product. Even if you specify position='dodge', geom_col would not dodge them because there's no information about how to do that. That's why you include the group= aesthetic--to give geom_col information on how it should be dodging.

You use aes(... to indicate to ggplot which legends to create. If the aesthetic is mapped to x or y, it just uses that for plotting. group= aesthetics are used for dodging and other group attributes, but basically any other aesthetics (size, shape, color, fill, linetype... etc etc) are used to create legends. If we specify both stat_summary calls to include a color aesthetic, a legend will be created that is combined. The problem here is that there is no column in the dataset (because you have two) to use for mapping to color, so we create one by naming a character ("mean1" and "mean2").

Final point: It might be easier to plot this if you combine your datasets. You may still want to indicate where they came from, so something like this works:

df1$origin_df <- 'df1'
df2$origin_df <- 'df2'
df <- rbind(df1, df2)

Then plot with df and not df1. You can then use one stat_summary call where you specify color=origin_df.

Upvotes: 2

Related Questions