Calculate number of occurrences within a specific time period

Question

I have the folllowing data, where ID stands for an individual, Date for the date, and Purchased for whether somebody made a purchase (I made this last one so that I can count the the occurences):

   ID       Date Purchased
1   1 2017-01-01         1
2   1 2017-08-03         1
3   1 2017-09-02         1
4   2 2017-09-04         1
5   2 2018-07-12         1
6   2 2018-11-03         1
7   2 2018-12-05         1
8   2 2019-01-01         1
9   3 2018-02-03         1
10  3 2020-02-03         1
11  3 2020-03-01         1

I would like to create a variable called "Frequency" that calculates the number of times an individual has made a purchase in the past year by summing up all the "Purchased" before the specific Date you see in the data frame.

So for example, for row 3 this would lead to a "Frequency" of 2 since 2017-01-01 and 2017-08-03 are both within a one-year time period from 2017-09-02 (so within the interval of 2016-09-02 and 2017-09-01).
See desired output:

   ID       Date Purchased Frequency
1   1 2017-01-01         1         0
2   1 2017-08-03         1         1
3   1 2017-09-02         1         2
4   2 2017-09-04         1         0
5   2 2018-07-12         1         1
6   2 2018-11-03         1         1
7   2 2018-12-05         1         2
8   2 2019-01-01         1         3
9   3 2018-02-03         1         0
10  3 2020-02-03         1         0
11  3 2020-03-01         1         1

To reproduce the dataframe:

df <- data.frame(ID = c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3), Date = as.Date(c('2017-01-01', '2017-08-03', '2017-09-02', '2017-09-04', '2018-07-12', '2018-11-03', '2018-12-05', '2019-01-01', '2018-02-03', '2020-02-03', '2020-03-01')), Purchased = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ))

I've searched on stackoverlow but haven't been able to find an answer yet that I'm able to apply to my situation and obtain the desired results. One of the things that I found and tried was this:

df$frequency <-
sapply(df$Date, function(x){
sum(df$Date < x & df$Date >= x - 365)
})

I believe this might give me the results I want if I can find a way to include that it groups by ID (so it sums per ID and not overall). Can't say for sure of course since I haven't been able to test it out. Any help is much appreciated.

Ronak Shah · Accepted Answer

Here's a tidyverse solution :

library(dplyr)
library(purrr)
library(lubridate)

df %>%
  group_by(ID) %>%
  mutate(Frequency = map_dbl(Date, 
                     ~sum(Purchased[between(Date, .x - years(1), .x - 1)]))) %>%
  ungroup

#      ID Date       Purchased Frequency
#                  
# 1     1 2017-01-01         1         0
# 2     1 2017-08-03         1         1
# 3     1 2017-09-02         1         2
# 4     2 2017-09-04         1         0
# 5     2 2018-07-12         1         1
# 6     2 2018-11-03         1         1
# 7     2 2018-12-05         1         2
# 8     2 2019-01-01         1         3
# 9     3 2018-02-03         1         0
#10     3 2020-02-03         1         0
#11     3 2020-03-01         1         1

The logic of the code is for every Date in each ID it sums the Purchased value between current date - 1 year and current date - 1 day.

Calculate number of occurrences within a specific time period

Answers (2)

Related Questions