calculating cumulatives within a group correctly

Question

I hope anyone can help with this. I have a data frame similar to this:

test <- data.frame(ID = c(1:24),
                  group = rep(c(1,1,1,1,1,1,2,2,2,2,2,2),2),
                  year1 = rep(c(2018,2018,2018,2019,2019,2019),4),
                  month1 = rep(c(1,2,3),8))

Now I want to do a cumsum per group but when I use the following code the sumsum 'restarts' each year.

test2 <-test %>% 
  group_by(group,year1,month1) %>% 
  summarise(a = length(unique(ID)))  %>%
  mutate(a = cumsum(a))

My desired output is:

   group year1 month1  a
1      1  2018      1  2
2      1  2018      2  4
3      1  2018      3  6
4      1  2019      1  8
5      1  2019      2 10
6      1  2019      3 12
7      2  2018      1  2
8      2  2018      2  4
9      2  2018      3  6
10     2  2019      1  8
11     2  2019      2 10
12     2  2019      3 12

Ronak Shah · Accepted Answer

You could first count unique ID for each group, month and year and then take cumsum of it for each group.

library(dplyr)

test %>%
  group_by(group, year1, month1) %>%
  summarise(a = n_distinct(ID)) %>%
  group_by(group) %>%
  mutate(a = cumsum(a)) 

#   group year1 month1     a
#       
# 1     1  2018      1     2
# 2     1  2018      2     4
# 3     1  2018      3     6
# 4     1  2019      1     8
# 5     1  2019      2    10
# 6     1  2019      3    12
# 7     2  2018      1     2
# 8     2  2018      2     4
# 9     2  2018      3     6
#10     2  2019      1     8
#11     2  2019      2    10
#12     2  2019      3    12

calculating cumulatives within a group correctly

Answers (2)

Related Questions