philsf
philsf

Reputation: 242

ggplot2 - geom_line of cumulative counts of factor levels

I want to plot the cumulative counts of level OK of factor X (*), over time (column Date). I am not sure what is the best strategy, whether or not I should create a new data frame with a summary column, or if there is a ggplot2 way of doing this.

Sample data

DF <- data.frame(
  Date = as.Date(c("2018-01-01", "2018-01-01", "2018-02-01", "2018-03-01", "2018-03-01", "2018-04-01") ),
  X = factor(rep("OK", 6), levels = c("OK", "NOK")),
  Group = factor(c(rep("A", 4), "B", "B"))
)
DF <- rbind(DF, list(as.Date("2018-02-01"), factor("NOK"), "A"))

From similar questions I tried this:

ggplot(DF, aes(Date, col = Group)) + geom_line(stat='bin')

enter image description here

Using stat='count' (as the answer to this question) is even worse:

ggplot(DF, aes(Date, col = Group)) + geom_line(stat='count')

enter image description here

which shows the counts for factor levels (*), but not the accumulation over time.

Desperate measure - count with table

I tried creating a new data frame with counts using table like this:

cum <- as.data.frame(table(DF$Date, DF$Group))
ggplot(cum, aes(Var1, cumsum(Freq), col = Var2, group = Var2)) +
  geom_line()

enter image description here

Is there a way to do this with ggplot2? Do I need to create a new column with cumsum? If so, how should I cumsum the factor levels, by date?

(*) Obs: I could just filter the data frame to use only the intended levels with DF[X == "OK"], but I am sure someone can find a smarter solution.

Upvotes: 1

Views: 1841

Answers (1)

MKR
MKR

Reputation: 20095

One option using dplyr and ggplot2 can be as:

library(dplyr)
library(ggplot2)

DF %>% group_by(Group) %>%
       arrange(Date) %>%
       mutate(Value = cumsum(X=="OK")) %>%
      ggplot(aes(Date, y=Value, group = Group, col = Group)) + geom_line()

enter image description here

Upvotes: 3

Related Questions