Convert year data frame to category count data frame in R

Question

I have a data frame organized like the sample below:

set.seed(661)

raw <- data.frame(
    year = 1900:2020,
    cat = sample(c("A", "B", "C", "D", "E"), size = 121, replace = TRUE)
)

head(raw)

  year cat
1 1900   B
2 1901   A
3 1902   E
4 1903   E
5 1904   B
6 1905   A

However, I would like to transform this into a data frame where I have a cumulative count for every unique value of raw$cat for each year. This way I can track the total count of each category from 1900 to raw$year==someX. The desired data frame would like something like the below:

desiredFinal <- data.frame(
    year = c(rep(1901, 5), rep(1902, 5)),
    cat = c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E"),
    count = c(0, 1, 0, 0, 0, 1, 1, 0, 0, 0)
)

desiredFinal

   time cat count
1  1900   A     0
2  1900   B     1
3  1900   C     0
4  1900   D     0
5  1900   E     0
6  1901   A     1
7  1901   B     1
8  1901   C     0
9  1901   D     0
10 1901   E     0

I tried the below snippet of code:

test <- raw %>%
    group_by(cat) %>%
    transmute(year = year, 
              count = seq(n())
              )

But this didn't give me a view of every category for each year.

I also considered creating a series of binary variables for each category in a wide format for each year (ie when raw$cat=="B" then raw$catB==1 and raw$catA==0 and so) then reshaping it, but wasn't sure if this was the most efficient solution.

I would prefer a solution to this problem in base R, but I would also appreciate a solution in dplyr!

Thanks so much!

Ronak Shah · Accepted Answer

Use complete to create every cat value in each year, for each cat get the cumulative sum of count.

library(dplyr)
library(tidyr)

raw %>%
  mutate(count = 1) %>%
  complete(year, cat, fill = list(count = 0)) %>%
  group_by(cat) %>%
  mutate(count = cumsum(count)) %>%
  ungroup

# A tibble: 605 x 3
#    year cat   count
#     
# 1  1900 A         0
# 2  1900 B         1
# 3  1900 C         0
# 4  1900 D         0
# 5  1900 E         0
# 6  1901 A         1
# 7  1901 B         1
# 8  1901 C         0
# 9  1901 D         0
#10  1901 E         0
# … with 595 more rows

In base R, you can do this as :

dat <- expand.grid(year = unique(raw$year), cat = unique(raw$cat))
raw$count <- 1
dat1 <- merge(dat, raw, all.x = TRUE)
dat1$count[is.na(dat1$count)] <- 0
dat1 <- transform(dat1, count = ave(count, cat, FUN = cumsum))

Convert year data frame to category count data frame in R

Answers (2)

Related Questions