Reputation: 242
I want to plot the cumulative counts of level OK
of factor X
(*), over time (column Date
). I am not sure what is the best strategy, whether or not I should create a new data frame with a summary column, or if there is a ggplot2 way of doing this.
Sample data
DF <- data.frame(
Date = as.Date(c("2018-01-01", "2018-01-01", "2018-02-01", "2018-03-01", "2018-03-01", "2018-04-01") ),
X = factor(rep("OK", 6), levels = c("OK", "NOK")),
Group = factor(c(rep("A", 4), "B", "B"))
)
DF <- rbind(DF, list(as.Date("2018-02-01"), factor("NOK"), "A"))
From similar questions I tried this:
ggplot(DF, aes(Date, col = Group)) + geom_line(stat='bin')
Using stat='count'
(as the answer to this question) is even worse:
ggplot(DF, aes(Date, col = Group)) + geom_line(stat='count')
which shows the counts for factor levels (*), but not the accumulation over time.
Desperate measure - count with table
I tried creating a new data frame with counts using table
like this:
cum <- as.data.frame(table(DF$Date, DF$Group))
ggplot(cum, aes(Var1, cumsum(Freq), col = Var2, group = Var2)) +
geom_line()
Is there a way to do this with ggplot2? Do I need to create a new column with cumsum
? If so, how should I cumsum
the factor levels, by date?
(*) Obs: I could just filter the data frame to use only the intended levels with DF[X == "OK"]
, but I am sure someone can find a smarter solution.
Upvotes: 1
Views: 1841
Reputation: 20095
One option using dplyr
and ggplot2
can be as:
library(dplyr)
library(ggplot2)
DF %>% group_by(Group) %>%
arrange(Date) %>%
mutate(Value = cumsum(X=="OK")) %>%
ggplot(aes(Date, y=Value, group = Group, col = Group)) + geom_line()
Upvotes: 3