Reputation: 602
I have a dataset with a bunch of observations by year. I just want to calculate percentages of "fail" and "attend", by year, and then plot the yearly trends with geom_line()
together on the same plot. I got started with the code below but it's not quite right--it needs to be collapsed by year, I think?
Code:
df %>%
group_by(year) %>%
mutate(perc_fail = fail/sum(fail),
perc_attend = attend/sum(attend)) %>%
ggplot(., aes(x = year)) +
geom_line()
Data:
df < -structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L), .Label = c("2000", "2001", "2002", "2003"
), class = "factor"), fail = c(0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1,
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0,
0, 0, 1, 1, 0, 0, 0, 0), attend = c(1, 1, 1, 1, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1,
1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -60L), spec = structure(list(
cols = list(year = structure(list(), class = c("collector_double",
Upvotes: 0
Views: 40
Reputation: 21992
You can use summarise()
rather than mutate()
to get a single value per year and then plot. Note that when you're plotting different series from different variables, you can put the label you want in the legend in the aesthetic (as I did for colour in both geom_line()
calls.
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
group_by(year) %>%
summarise(perc_fail = mean(fail),
perc_attend = mean(attend)) %>%
ggplot(., aes(x = year, group=1)) +
geom_line(aes(y= perc_fail, colour="Fail")) +
geom_line(aes(y=perc_attend, colour="Attend")) +
labs(y="Percent",
x="Year",
colour ="") +
scale_y_continuous(labels=~scales::percent(.x))
You could also pivot the data to long format and use state_summary()
to generate the summary statistics for you. The code below will produce the same graph.
df %>%
mutate(year = as.numeric(as.character(year))) %>%
pivot_longer(c("fail", "attend"), names_to="status", values_to = "vals") %>%
ggplot(aes(x=year, y = vals, colour=status)) +
stat_summary(fun = mean, geom="line") +
labs(y="Percent",
x="Year",
colour ="") +
scale_y_continuous(labels=~scales::percent(.x))
Upvotes: 1