Pat Taggart
Pat Taggart

Reputation: 375

Count records per group within date lag

I have a large dataset structured as follows

dat <- data.frame("Date" = c("2017-01-01", "2017-01-20", "2017-04-15", "2017-11-03", "2018-01-03", "2019-12-12", "2020-03-07", "2014-05-19", "2009-07-03", "2010-02-04"), 
                  "Region" = c("Adelaide", "Albany", "Albany", "Albany", "Albany", "Albany", "Albany", "Middleton", "Alice Springs", "Alice Springs"))

dat$Date <- as.Date(dat$Date, "%Y-%m-%d")

For each record I would like to count the number records if they are within the previous 12 months and only if they are within the same region. Any help in achieving this would be greatly appreciated?

My final dataset should look the same as the original dataset but with the addition of the below count column

dat$Count <- c(0, 0, 1, 2, 3, 0, 1, 0, 0, 1)

This count column shows that for the first record in the dataset there are no other records from the same region that occurred in the previous 12 months, however for 5th record in the dataset there are 3 other records that occurred in the previous 12 months within the same region.

I have tried a range of solutions in dplyr using group_by(), count() and summarise() but haven't come close to my intention yet.

Upvotes: 2

Views: 150

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388817

For each Region you can take the difference of the current date with all other dates and count the number of dates that are between 0 and 365 days.

library(dplyr)
library(purrr)

dat %>%
  group_by(Region) %>%
  mutate(Count = map_dbl(row_number(), 
                   ~sum(between(Date[.x] - Date[-.x], 0, 365)))) %>%
  ungroup

#    Date       Region        Count
#   <date>     <chr>         <dbl>
# 1 2017-01-01 Adelaide          0
# 2 2017-01-20 Albany            0
# 3 2017-04-15 Albany            1
# 4 2017-11-03 Albany            2
# 5 2018-01-03 Albany            3
# 6 2019-12-12 Albany            0
# 7 2020-03-07 Albany            1
# 8 2014-05-19 Middleton         0
# 9 2009-07-03 Alice Springs     0
#10 2010-02-04 Alice Springs     1

Upvotes: 2

Onyambu
Onyambu

Reputation: 79198

You could do:

dat %>%
  group_by(Region) %>%
  mutate(a = accumulate(c(0, diff(Date)), ~if(.x+.y < 365).x+.y else 0))%>%
  group_by(Count = cumsum(a==0)) %>%
  mutate(Count = seq(n())- 1, a = NULL)
         
  Date       Region        Count
   <date>     <chr>         <dbl>
 1 2017-01-01 Adelaide          0
 2 2017-01-20 Albany            0
 3 2017-04-15 Albany            1
 4 2017-11-03 Albany            2
 5 2018-01-03 Albany            3
 6 2019-12-12 Albany            0
 7 2020-03-07 Albany            1
 8 2014-05-19 Middleton         0
 9 2009-07-03 Alice Springs     0
10 2010-02-04 Alice Springs     1

You could also do:

library(data.table)

setDT(dat)
dat[, Count := c(0, diff(Date)), Region][,
      Count := rowid(cumsum(replace(Count, Count>365, 0) == 0))-1]

Upvotes: 0

Related Questions