Jeremy R. Johnson
Jeremy R. Johnson

Reputation: 305

Cumulative Count of Members by Month DPLYR

I have a list of members by signup month, what I'd like to do is create a data frame with total members by month.

Raw Data

month.list <- structure(c(18444, 18687, 18475, 18506, 18536, 18567, 18597,18718, 18659, 18628, 18779, 18748), class = "Date") 
total.membership.working <- structure(list(`Mem Account` = c(26137295, 26139796, 26400007,26400455, 26402031, 26402078, 26402239, 1092287142, 1092295228,1092473120), Month = structure(c(18444, 18687, 18444, 18444,18475, 18475, 18444, 18779, 18779, 18779), class = "Date")), row.names = c(NA,-10L), groups = structure(list(`Mem Account` = c(26137295, 26139796,26400007, 26400455, 26402031, 26402078, 26402239, 1092287142,1092295228, 1092473120), .rows = structure(list(1L, 2L, 3L, 4L,5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of","vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df","tbl", "data.frame"), .drop = TRUE), class = c("grouped_df","tbl_df", "tbl", "data.frame")) 

I have written a for loop that will accomplish this, but I'm hoping to find a Tidy way to do it without the loop.

For Loop

total.membership <- data.frame()
for(i in 1:length(month.list)) {
  
  foo <- total.membership.working %>%
    ungroup() %>%
    filter(Month <= month.list[i]) %>%
    summarise(Month = max(Month),
      total_membership = n_distinct(`Mem Account`))
  total.membership <- total.membership %>%
    bind_rows(foo)
}

Desired Output

total.membership <- structure(list(Month = structure(c(18444, 18687, 18475, 18506,18536, 18567, 18597, 18628, 18779, 18748), class = "Date"), total_membership = c(45886L,58128L, 47878L, 49214L, 51119L, 53390L, 55200L, 56299L, 60503L,59583L)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 10L, 11L,12L), class = "data.frame") 

> total.membership
        Month total_membership
1  2020-07-01            45886
2  2021-03-01            58128
3  2020-08-01            47878
4  2020-09-01            49214
5  2020-10-01            51119
6  2020-11-01            53390
7  2020-12-01            55200
8  2021-04-01            58902
9  2021-02-01            57238
10 2021-01-01            56299
11 2021-06-01            60503
12 2021-05-01            59583

Upvotes: 0

Views: 75

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

Try this code to calculate cumulative unique accounts in each month.

library(dplyr)

total.membership.working %>%
  ungroup %>%
  arrange(Month) %>%
  mutate(cum_n = cumsum(!duplicated(`Mem Account`))) %>%
  group_by(Month) %>% 
  summarise(cum_unique_entries = max(cum_n))

#  Month      cum_unique_entries
#  <date>                  <int>
#1 2020-07-01                  4
#2 2020-08-01                  6
#3 2021-03-01                  7
#4 2021-06-01                 10

Upvotes: 1

Related Questions