Reputation: 4388
I have a dataset, df, that has 10,000 rows:
DateA
9/9/2019 7:52:16 PM
9/9/2019 7:52:16 PM
9/9/2019 7:52:17 PM
9/9/2019 7:52:18 PM
9/9/2019 7:52:18 PM
9/9/2019 7:52:19 PM
9/10/2019 1:02:23 AM
9/10/2019 1:02:25 AM
9/10/2019 1:02:26 AM
9/10/2019 1:02:27 AM
9/10/2019 1:02:27 AM
9/10/2019 1:02:29 AM
9/10/2019 1:02:29 AM
9/10/2019 1:03:29 AM
9/10/2019 1:03:29 AM
9/10/2019 1:03:31 AM
9/10/2019 1:03:32 AM
9/10/2019 4:18:48 AM
9/10/2019 4:18:50 AM
9/10/2019 4:18:51 AM
I would like this output:
Group Duration
a 3 sec
b 6 sec
c 3 sec
d 3 sec
I would like the thresh to be set to 1 min or 60 sec. If a lapse of more than 60 seconds is detected, a new group will be made, along with its duration.
dput:
structure(list(DateA = structure(c(12L, 12L, 13L, 14L, 14L, 15L,
1L, 2L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("9/10/2019 1:02:23 AM",
"9/10/2019 1:02:25 AM", "9/10/2019 1:02:26 AM", "9/10/2019 1:02:27 AM",
"9/10/2019 1:02:29 AM", "9/10/2019 1:03:29 AM", "9/10/2019 1:03:31 AM",
"9/10/2019 1:03:32 AM", "9/10/2019 4:18:48 AM", "9/10/2019 4:18:50 AM",
"9/10/2019 4:18:51 AM", "9/9/2019 7:52:16 PM", "9/9/2019 7:52:17 PM",
"9/9/2019 7:52:18 PM", "9/9/2019 7:52:19 PM"), class = "factor")), class = "data.frame", row.names = c(NA,
-20L))
I have tried:
thresh1 <-60
library(data.table)
setDT(df)[, DateA := as.ITime(as.character(DateA))][,
.(Duration = difftime(max(as.POSIXct(DateA)), min(as.POSIXct(DateA)),
unit = 'sec')),.(group = letters[cumsum(c(TRUE, diff(DateA) > thresh1))])]
However, I am doing something wrong because I am only getting the output of 1 row.
group Duration
a 0
Not sure what I am doing wrong? Any suggestion is appreciated.
Upvotes: 0
Views: 39
Reputation: 388817
We can convert DateA
to POSIXct
class, format
it to include information only till minute precision and find the difference between max
and min
duration in each group.
library(dplyr)
df %>%
mutate(DateA = lubridate::dmy_hms(DateA),
temp = format(DateA, "%Y-%m-%d %H:%M")) %>%
group_by(temp) %>%
summarise(duration = difftime(max(DateA), min(DateA), units = "secs"))
# A tibble: 4 x 2
# temp duration
# <chr> <drtn>
#1 2019-09-09 19:52 3 secs
#2 2019-10-09 01:02 6 secs
#3 2019-10-09 01:03 3 secs
#4 2019-10-09 04:18 3 secs
Upvotes: 2