Reputation: 301
I have a dataframe [df] like this:
marker_id timestamp
A 2020-10-01 07:32:14
A 2020-10-01 07:34:09
A 2020-10-01 10:13:00
A 2020-10-05 06:55:04
A 2020-10-05 06:59:14
A 2020-10-09 17:44:02
B 2020-10-01 13:58:47
B 2020-10-02 04:11:38
B 2020-10-02 04:15:07
I would like to know whether there is a way of assigning groups to individual timestamp sequences for each marker_id. The sequence should contain rows where the difference between 2 rows in less than 1 hour.
The desired result would look like this:
marker_id timestamp group_id
A 2020-10-01 07:32:14 1
A 2020-10-01 07:34:09 1
A 2020-10-01 10:13:00 2
A 2020-10-05 06:55:04 3
A 2020-10-05 06:59:14 3
A 2020-10-09 17:44:02 4
B 2020-10-01 13:58:47 5
B 2020-10-02 04:11:38 6
B 2020-10-02 04:15:07 6
Is there any way of doing this? Thank you for any suggestions.
Upvotes: 1
Views: 173
Reputation: 9087
Here is some dummy data.
library(tidyverse)
df <- tribble(
~marker_id, ~timestamp,
'A', '2020-10-01 07:32:14',
'A', '2020-10-01 07:34:09',
'A', '2020-10-01 10:13:00',
'A', '2020-10-05 06:55:04',
'A', '2020-10-05 06:59:14',
'A', '2020-10-09 17:44:02',
'B', '2020-10-01 13:58:47',
'B', '2020-10-02 04:11:38',
'B', '2020-10-02 04:15:07'
) %>%
mutate(timestamp = lubridate::as_datetime(timestamp))
Use lag
to get the difference beween rows and convert that to hours. group_id
can be computed by checking if it is >= 1
and getting the cumsum
.
df %>%
mutate(
group_id = cumsum(
abs(as.numeric(timestamp - lag(timestamp, default = min(timestamp)), units = "hours")) >= 1
) + 1
)
#> # A tibble: 9 x 3
#> marker_id timestamp group_id
#> <chr> <dttm> <dbl>
#> 1 A 2020-10-01 07:32:14 1
#> 2 A 2020-10-01 07:34:09 1
#> 3 A 2020-10-01 10:13:00 2
#> 4 A 2020-10-05 06:55:04 3
#> 5 A 2020-10-05 06:59:14 3
#> 6 A 2020-10-09 17:44:02 4
#> 7 B 2020-10-01 13:58:47 5
#> 8 B 2020-10-02 04:11:38 6
#> 9 B 2020-10-02 04:15:07 6
Upvotes: 2