Reputation: 375
I have a large dataset structured as follows
dat <- data.frame("Date" = c("2017-01-01", "2017-01-20", "2017-04-15", "2017-11-03", "2018-01-03", "2019-12-12", "2020-03-07", "2014-05-19", "2009-07-03", "2010-02-04"),
"Region" = c("Adelaide", "Albany", "Albany", "Albany", "Albany", "Albany", "Albany", "Middleton", "Alice Springs", "Alice Springs"))
dat$Date <- as.Date(dat$Date, "%Y-%m-%d")
For each record I would like to count the number records if they are within the previous 12 months and only if they are within the same region. Any help in achieving this would be greatly appreciated?
My final dataset should look the same as the original dataset but with the addition of the below count column
dat$Count <- c(0, 0, 1, 2, 3, 0, 1, 0, 0, 1)
This count column shows that for the first record in the dataset there are no other records from the same region that occurred in the previous 12 months, however for 5th record in the dataset there are 3 other records that occurred in the previous 12 months within the same region.
I have tried a range of solutions in dplyr
using group_by()
, count()
and summarise()
but haven't come close to my intention yet.
Upvotes: 2
Views: 150
Reputation: 388817
For each Region
you can take the difference of the current date with all other dates and count the number of dates that are between 0 and 365 days.
library(dplyr)
library(purrr)
dat %>%
group_by(Region) %>%
mutate(Count = map_dbl(row_number(),
~sum(between(Date[.x] - Date[-.x], 0, 365)))) %>%
ungroup
# Date Region Count
# <date> <chr> <dbl>
# 1 2017-01-01 Adelaide 0
# 2 2017-01-20 Albany 0
# 3 2017-04-15 Albany 1
# 4 2017-11-03 Albany 2
# 5 2018-01-03 Albany 3
# 6 2019-12-12 Albany 0
# 7 2020-03-07 Albany 1
# 8 2014-05-19 Middleton 0
# 9 2009-07-03 Alice Springs 0
#10 2010-02-04 Alice Springs 1
Upvotes: 2
Reputation: 79198
You could do:
dat %>%
group_by(Region) %>%
mutate(a = accumulate(c(0, diff(Date)), ~if(.x+.y < 365).x+.y else 0))%>%
group_by(Count = cumsum(a==0)) %>%
mutate(Count = seq(n())- 1, a = NULL)
Date Region Count
<date> <chr> <dbl>
1 2017-01-01 Adelaide 0
2 2017-01-20 Albany 0
3 2017-04-15 Albany 1
4 2017-11-03 Albany 2
5 2018-01-03 Albany 3
6 2019-12-12 Albany 0
7 2020-03-07 Albany 1
8 2014-05-19 Middleton 0
9 2009-07-03 Alice Springs 0
10 2010-02-04 Alice Springs 1
You could also do:
library(data.table)
setDT(dat)
dat[, Count := c(0, diff(Date)), Region][,
Count := rowid(cumsum(replace(Count, Count>365, 0) == 0))-1]
Upvotes: 0