Joep_S
Joep_S

Reputation: 537

calculating cumulatives within a group correctly

I hope anyone can help with this. I have a data frame similar to this:

test <- data.frame(ID = c(1:24),
                  group = rep(c(1,1,1,1,1,1,2,2,2,2,2,2),2),
                  year1 = rep(c(2018,2018,2018,2019,2019,2019),4),
                  month1 = rep(c(1,2,3),8))

Now I want to do a cumsum per group but when I use the following code the sumsum 'restarts' each year.

test2 <-test %>% 
  group_by(group,year1,month1) %>% 
  summarise(a = length(unique(ID)))  %>%
  mutate(a = cumsum(a))

My desired output is:

   group year1 month1  a
1      1  2018      1  2
2      1  2018      2  4
3      1  2018      3  6
4      1  2019      1  8
5      1  2019      2 10
6      1  2019      3 12
7      2  2018      1  2
8      2  2018      2  4
9      2  2018      3  6
10     2  2019      1  8
11     2  2019      2 10
12     2  2019      3 12

Upvotes: 1

Views: 75

Answers (2)

akrun
akrun

Reputation: 887088

With data.table, this can be done with

library(data.table)
setDT(test)[, .(a = uniqueN(ID)), by = .(group, year1, month1)
        ][, a := cumsum(a), by = group]

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

You could first count unique ID for each group, month and year and then take cumsum of it for each group.

library(dplyr)

test %>%
  group_by(group, year1, month1) %>%
  summarise(a = n_distinct(ID)) %>%
  group_by(group) %>%
  mutate(a = cumsum(a)) 

#   group year1 month1     a
#   <dbl> <dbl>  <dbl> <int>
# 1     1  2018      1     2
# 2     1  2018      2     4
# 3     1  2018      3     6
# 4     1  2019      1     8
# 5     1  2019      2    10
# 6     1  2019      3    12
# 7     2  2018      1     2
# 8     2  2018      2     4
# 9     2  2018      3     6
#10     2  2019      1     8
#11     2  2019      2    10
#12     2  2019      3    12

Upvotes: 1

Related Questions