Reputation:
I have been trying to find a solution to this and am unsure if it would require multiple steps using dplyr and/or creating some kind of function. Essentially, I want to create an "event" column with variables that link a sequence/repeat dates and ID together and indicate the event order.
before:
event date id
2020-12-25 111
2020-12-26 111
2020-12-27 111
2020-12-30 222
2020-12-30 222
2021-01-30 333
2021-01-31 333
2021-05-25 222
2021-05-26 222
2021-07-08 222
2021-07-09 222
after:
event date id event
2020-12-25 111 1
2020-12-26 111 1
2020-12-27 111 1
2020-12-30 222 1
2020-12-30 222 1
2021-01-30 333 1
2021-01-31 333 1
2021-05-25 222 2
2021-05-26 222 2
2021-07-08 222 3
2021-07-09 222 3
The process to assign event values is essentially if the id is linked to a sequence of dates that are consecutive, then this can be grouped into an event. If the id is linked to multiple consecutive dates/individual dates, then that is a new and separate event associated with that id.
edit: added more data points to provide more info
Upvotes: 2
Views: 1078
Reputation: 160447
We can capitalize on data.table::rleid
for this.
(I converted your event_date
to a Date
-class before running these.)
library(data.table)
as.data.table(dat)[, r1 := rleid(id, cumsum(c(FALSE, diff(event_date) > 1)))
][, event2 := match(r1, unique(r1)), by = id
][, r1 := NULL ]
# event_date id event event2
# 1: 2020-12-25 111 1 1
# 2: 2020-12-26 111 1 1
# 3: 2020-12-27 111 1 1
# 4: 2020-12-30 222 1 1
# 5: 2020-12-30 222 1 1
# 6: 2021-01-30 333 1 1
# 7: 2021-01-31 333 1 1
# 8: 2021-05-25 222 2 2
# 9: 2021-05-26 222 2 2
For non-data.table
applications where you don't have it installed, you can use
my_rleid <- function(...) {
r <- rle(do.call(paste, c(list(...), sep = "_")))$lengths
rep(seq_along(r), times = r)
}
library(dplyr)
dat %>%
mutate(r1 = my_rleid(id, cumsum(c(FALSE, diff(event_date) > 1)))) %>%
group_by(id) %>%
mutate(event2 = match(r1, unique(r1))) %>%
ungroup() %>%
select(-r1)
r1 <- my_rleid(dat$id, cumsum(c(FALSE, diff(dat$event_date) > 1)))
dat$event2 <- ave(r1, dat$id, FUN = function(z) match(z, unique(z)))
dat$r1 <- NULL
Data
dat <- structure(list(event_date = structure(c(18621, 18622, 18623, 18626, 18626, 18657, 18658, 18772, 18773), class = "Date"), id = c(111L, 111L, 111L, 222L, 222L, 333L, 333L, 222L, 222L), event = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L)), row.names = c(NA, -9L), class = "data.frame")
Upvotes: 1
Reputation: 388982
You can do this with the help of diff
and cumsum
-
library(dplyr)
df <- df %>%
mutate(event_date = as.Date(event_date)) %>%
group_by(id) %>%
mutate(event = cumsum(c(TRUE, diff(event_date) > 1))) %>%
ungroup
df
# event_date id event
# <date> <int> <int>
#1 2020-12-25 111 1
#2 2020-12-26 111 1
#3 2020-12-27 111 1
#4 2020-12-30 222 1
#5 2020-12-30 222 1
#6 2021-01-30 333 1
#7 2021-01-31 333 1
#8 2021-05-25 222 2
#9 2021-05-26 222 2
For each id
, event
value is incremented when the difference between consecutive dates is greater than 1.
Upvotes: 1