How to calculate the number of distinct values encountered to date in a group using dplyr

Question

I have customer data grouped by customer ID and ordered by date of purchase. I want to add a column that cumulatively counts the number of distinct products that have been ordered to date - i.e:

Input <- data.frame(Customer = c("C-01", "C-01", "C-02", "C-02", "C-02", "C-02", "C-03", "C-03", "C-03", "C-03"),
              Product = c("COKE", "COKE", "FRIES", "SHAKE", "BURGER", "BURGER", "CHICKEN", "FISH", "FISH", "FISH"),
              Date = c("2018-01-02","2018-01-05","2018-01-03","2018-01-06","2018-01-08","2018-01-12","2018-01-02","2018-01-04", "2018-01-16", "2018-01-20"))

Output <- data.frame(Customer = c("C-01", "C-01", "C-02", "C-02", "C-02", "C-02", "C-03", "C-03", "C-03", "C-03"),
                Product = c("COKE", "COKE", "FRIES", "SHAKE", "BURGER", "BURGER", "CHICKEN", "FISH", "FISH", "FISH"),
                Date = c("2018-01-02","2018-01-05","2018-01-03","2018-01-06","2018-01-08","2018-01-12","2018-01-02","2018-01-04", "2018-01-16", "2018-01-20"),
                Cum_Distinct = c(1, 1, 1, 2, 3, 3, 1, 2, 2, 2))

With the Input data set, I want to create the output data using dplyr. How do I keep a cumulative count of distinct products encountered to date?

Ronak Shah · Accepted Answer

We could take cumulative sum over non-duplicated values by group.

library(dplyr)

Input %>%
  group_by(Customer) %>%
  mutate(Cum_Distinct = cumsum(!duplicated(Product)))


# Customer  Product Date       Cum_Distinct
#                
# 1 C-01     COKE    2018-01-02     1
# 2 C-01     COKE    2018-01-05     1
# 3 C-02     FRIES   2018-01-03     1
# 4 C-02     SHAKE   2018-01-06     2
# 5 C-02     BURGER  2018-01-08     3
# 6 C-02     BURGER  2018-01-12     3
# 7 C-03     CHICKEN 2018-01-02     1
# 8 C-03     FISH    2018-01-04     2
# 9 C-03     FISH    2018-01-16     2
#10 C-03     FISH    2018-01-20     2

How to calculate the number of distinct values encountered to date in a group using dplyr

Answers (2)

unnest

Related Questions