Reputation: 33
I want to plot the values of df1 by two groups i.e. product and start_date and also plot a crossbar with the mean of df1(blue) and mean of df2(red) as in the attached diagram.
df1 <- data.frame(product = c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C","D","D","D","D","D","D","D"),
start_date =as.Date(c('2020-02-01', '2020-02-02', '2020-02-03', '2020-02-04', '2020-02-05', '2020-02-06', '2020-02-07')),
value = c(15.71,17.37,19.93,14.28,15.85,10.5,8.58,5.62,5.19,5.44,4.6,7.04,6.29,3.3,20.35,27.92,23.07,12.83,22.28,21.32,31.46,34.82,23.68,29.11,14.48,25.2,16.91,27.79))
df2 <- data.frame(product = c("A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C","C","D","D","D","D","D","D"),
start_date =as.Date(c('2019-07-09', '2019-07-10', '2019-07-11', '2019-07-12', '2019-07-13', '2019-07-14')),
value = c(9.06,10.74,14.64,7.67,8.72,11.21,4.76,4.53,3.81,4.32,3.95,5.2,20.36,21.17,19.51,16.25,17.93,16.94,14.51,14.65,23.28,10.84,16.71,12.48))
graph1 <- ggplot(df1, aes(
y = value, x = product, fill = product, color = factor(start_date))) +
geom_col(data = df1, stat = "identity",position = position_dodge(width = 0.8), width = 0.7, inherit.aes = TRUE, size = 0) +
xlab("Product") + ylab("Values") + ylim(c(0,40)) +
scale_fill_manual(values=c("#008FCC", "#FFAA00", "#E60076", "#B00000")) +
stat_summary(data = df1, aes(x = factor(product),y = value),fun = "mean",geom = "crossbar", color = "blue", size = 1, width = 0.8, inherit.aes = FALSE) +
stat_summary(data = df2, aes(x = factor(product),y = value),fun = "mean",geom = "crossbar", color = "red", size = 1, width = 0.8, inherit.aes = FALSE)
Is there any way to remove the borders of the bar plots and add legend of the two crossbars at the top right corner of the plot ?
Additionally I would like to know if is there a way to add the just the "date" from df1 below each bar in the plot ?
Upvotes: 0
Views: 592
Reputation: 13793
Your question about adjusting the plot has multiple parts. To summarize a few points:
Change from color=factor(start_date)
to group=
to remove the color around bars, but maintain the separation of individual bars by start_date
Use theme(legend.position=...
and specify precise placement of legend within plot area. Use theme(legend.direction='horizontal')
too when appropriate.
Add color=
attribute into the stat_summary(geom='crossbar'...)
calls in order to "add" them both to a legend, then use scale_color_manual
to specify color if you don't like the default.
Minor suggestion: Use ylim(X,Y)
instead of ylim(c(X,Y))
. It's not necessary to put the limits into a vector, since ylim
can accept that instead and it's simpler. Note that it still works either way, so that's why this point is minor.
You don't need the data=df1
for the first stat_summary
call, since it's the default mapping based on the data=
value set in ggplot(...
. You still need the y=
value though, since it is required.
Here's the adjusted code from implementing the notes above:
ggplot(df1, aes(y = value, x = product, fill = product, group = factor(start_date))) +
geom_col(data = df1, position = position_dodge(width = 0.8),
width = 0.7, inherit.aes = TRUE, size = 0) +
xlab("Product") + ylab("Values") + ylim(0,60) +
scale_fill_manual(values=c("#008FCC", "#FFAA00", "#E60076", "#B00000")) +
stat_summary(aes(x = factor(product), y=value, color='mean1'),
fun = "mean", geom = "crossbar",
size = 1, width = 0.8, inherit.aes = FALSE) +
stat_summary(data = df2, aes(x = factor(product),y=value, color='mean2'),
fun = "mean", geom = "crossbar",
size = 1, width = 0.8, inherit.aes = FALSE) +
theme(legend.position=c(0.75,0.8), legend.direction = 'horizontal') +
scale_color_manual(values=c('blue', 'red'))
Explanation: The point of changing to group=factor(start_date)
is so that you maintain the splitting of bars among the different products--a concept known as "dodging". Since your original call to color=
was in the aes(
, it created a legend item and the geom_col
used this for dodging, since the other aesthetics were already mapped to x
and y
, and the fill=
aesthetic was being applied. If you remove color=
, you get one bar for each product. Even if you specify position='dodge'
, geom_col
would not dodge them because there's no information about how to do that. That's why you include the group=
aesthetic--to give geom_col
information on how it should be dodging.
You use aes(...
to indicate to ggplot
which legends to create. If the aesthetic is mapped to x
or y
, it just uses that for plotting. group=
aesthetics are used for dodging and other group attributes, but basically any other aesthetics (size
, shape
, color
, fill
, linetype
... etc etc) are used to create legends. If we specify both stat_summary
calls to include a color
aesthetic, a legend will be created that is combined. The problem here is that there is no column in the dataset (because you have two) to use for mapping to color, so we create one by naming a character ("mean1" and "mean2").
Final point: It might be easier to plot this if you combine your datasets. You may still want to indicate where they came from, so something like this works:
df1$origin_df <- 'df1'
df2$origin_df <- 'df2'
df <- rbind(df1, df2)
Then plot with df
and not df1
. You can then use one stat_summary
call where you specify color=origin_df
.
Upvotes: 2