Reputation: 479
I have some data recorded over a period of months from 5 treatments (incl. control). I'm using ggplot to plot the data as a time series and have generated a data frame of the means of the raw data and standard error for each date.
I'm trying to plot all five treatments on the same graph and show the error bars with it. I'm able to a). plot one treatment group and show the error bars and b). plot all five treatments but not show the error bars.
Here's my data (I've only included two treatments to keep things tidy here)
dates c_mean_am c_se_am T1_mean_am T1_se_am
1 2017-01-31 284.135 27.43111 228.935 23.39037
2 2017-02-09 226.944 13.08237 173.241 13.42946
3 2017-02-23 281.135 15.89709 252.665 20.73417
4 2017-03-14 265.655 15.29930 238.225 17.47501
5 2017-04-06 312.785 13.08237 237.485 13.42946
Here's my code to achieve option a) above
ggplot(summary, aes(x=dates, y=c_mean_am),xlab="Date") +
geom_point(shape = 19, size = 2,color="blue") +
geom_line(color="blue") +
geom_errorbar(aes(x=dates, ymin=c_mean_am-c_se_am, ymax=c_mean_am+c_se_am), color="blue", width=0.25)
Here's the code for option b) above
sp <- ggplot(summary,aes(dates,y = Cond,color=Treatment)) +
geom_line(aes(y = c_mean_am, color = "Control")) +
geom_line(aes(y = T1_mean_am, color = "T1")) +
geom_point(aes(y = c_mean_am, color = "Control")) +
geom_point(aes(y = T1_mean_am, color = "T1"))
sp2<- sp +
scale_color_manual(breaks = c("Control", "T1","T2"), values=c("blue", "yellow"))
sp2
How can I get the error bars on the second plot using the same colours as the points and lines?
Thanks
AB
Upvotes: 1
Views: 5122
Reputation: 5204
The accepted answer seems to contain an error in the way the data was gather
ed (aka pivot_longer
in packageVersion("tidyr") >= 1.0.0
) which duplicated each point and error bar. The error bars are evident, but if you replace geom_point()
with geom_jitter()
you'll see both points that correspond to the two error bars. This has caused some confusion to others so I wanted to offer a corrected solution for posterity.
Here's another approach to that pivot that avoids this duplicaton:
# load necessary packages
library(tidyverse)
# create data from question
df <-
structure(
list(
dates = c(
"2017-01-31",
"2017-02-09",
"2017-02-23",
"2017-03-14",
"2017-04-06"
),
c_mean_am = c(284.135, 226.944,
281.135, 265.655, 312.785),
c_se_am = c(27.43111, 13.08237, 15.89709,
15.2993, 13.08237),
T1_mean_am = c(228.935, 173.241, 252.665,
238.225, 237.485),
T1_se_am = c(23.39037, 13.42946, 20.73417,
17.47501, 13.42946)
),
class = "data.frame",
row.names = c("1",
"2", "3", "4", "5")
)
# pivot df long and confirm that there's only one value per group per timepoint
df_long <- df %>%
pivot_longer(
cols = -dates,
names_to = c("treatment_group", ".value"),
names_pattern = "(.*)_(.*_am)"
)
df_long
# # A tibble: 10 x 4
# dates treatment_group mean_am se_am
# <chr> <chr> <dbl> <dbl>
# 1 2017-01-31 c 284. 27.4
# 2 2017-01-31 T1 229. 23.4
# 3 2017-02-09 c 227. 13.1
# 4 2017-02-09 T1 173. 13.4
# 5 2017-02-23 c 281. 15.9
# 6 2017-02-23 T1 253. 20.7
# 7 2017-03-14 c 266. 15.3
# 8 2017-03-14 T1 238. 17.5
# 9 2017-04-06 c 313. 13.1
# 10 2017-04-06 T1 237. 13.4
Now you can plot and get the expected graph with only a single error bar and single point for each group at each timepoint.
df_long %>%
ggplot(aes(x = dates, y = mean_am, colour = treatment_group)) +
geom_line(aes(group = treatment_group)) +
geom_point() +
geom_errorbar(aes(ymin = mean_am - se_am, ymax = mean_am + se_am))
Which produces this plot:
Upvotes: 1
Reputation: 7153
Transform your data into long-form first:
df <- df %>%
gather(mean_type, mean_val, c_mean_am, T1_mean_am) %>%
gather(se_type, se_val, c_se_am, T1_se_am)
ggplot(df, aes(dates, mean_val, colour=mean_type)) +
geom_line() +
geom_point() +
geom_errorbar(aes(ymin=mean_val-se_val, ymax=mean_val+se_val))
Edit: explanation for tidyr
manipulation
new.dat <- mtcars %>% # taking mtcars as the starting data.frame
select(gear, cyl, mpg, qsec) %>%
# equivalent to mtcars[, c("gear", "cyl", "mpg", "qsec")]; to simplify the example
gather(key=type, value=val, gear, cyl) %>%
# convert the data into a long form with 64 rows, with new factor column "type" and numeric column "val". "gear" and "cyl" are removed while "mpg" and "qsec" remain
new.dat[c(1:3, 33:35),]
# mpg qsec type val
# 1 21.0 16.46 gear 4
# 2 21.0 17.02 gear 4
# 3 22.8 18.61 gear 4
# 33 21.0 16.46 cyl 6
# 34 21.0 17.02 cyl 6
# 35 22.8 18.61 cyl 4
With the long form of data, you can use the new identifier form ("type") for plotting purposes, e.g.
ggplot(new.dat, aes(val, mpg, fill=type)) +
geom_col(position="dodge")
The long-format is also useful for plotting on different facet, e.g.
ggplot(new.dat, aes(val, mpg, colour=type)) +
geom_point() +
facet_wrap(~type)
Upvotes: 1