Lynn
Lynn

Reputation: 4388

Create new section and take time difference if time exceeds a certain amount (R, Dplyr)

I have a dataset, df, that has 10,000 rows:

DateA

9/9/2019 7:52:16 PM
9/9/2019 7:52:16 PM
9/9/2019 7:52:17 PM
9/9/2019 7:52:18 PM
9/9/2019 7:52:18 PM
9/9/2019 7:52:19 PM
9/10/2019 1:02:23 AM
9/10/2019 1:02:25 AM
9/10/2019 1:02:26 AM
9/10/2019 1:02:27 AM
9/10/2019 1:02:27 AM
9/10/2019 1:02:29 AM
9/10/2019 1:02:29 AM
9/10/2019 1:03:29 AM    
9/10/2019 1:03:29 AM    
9/10/2019 1:03:31 AM    
9/10/2019 1:03:32 AM    
9/10/2019 4:18:48 AM    
9/10/2019 4:18:50 AM    
9/10/2019 4:18:51 AM    

I would like this output:

Group   Duration

 a       3 sec
 b       6 sec
 c       3 sec
 d       3 sec

I would like the thresh to be set to 1 min or 60 sec. If a lapse of more than 60 seconds is detected, a new group will be made, along with its duration.

dput:

 structure(list(DateA = structure(c(12L, 12L, 13L, 14L, 14L,   15L, 
 1L, 2L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 8L, 9L, 10L, 11L),      .Label = c("9/10/2019 1:02:23 AM", 
"9/10/2019 1:02:25 AM", "9/10/2019 1:02:26 AM", "9/10/2019   1:02:27 AM", 
"9/10/2019 1:02:29 AM", "9/10/2019 1:03:29 AM", "9/10/2019 1:03:31 AM", 
"9/10/2019 1:03:32 AM", "9/10/2019 4:18:48 AM", "9/10/2019 4:18:50 AM", 
"9/10/2019 4:18:51 AM", "9/9/2019 7:52:16 PM", "9/9/2019 7:52:17 PM", 
"9/9/2019 7:52:18 PM", "9/9/2019 7:52:19 PM"), class =  "factor")), class =   "data.frame", row.names = c(NA, 
 -20L))

I have tried:

 thresh1 <-60

 library(data.table)
 setDT(df)[, DateA := as.ITime(as.character(DateA))][, 
  .(Duration = difftime(max(as.POSIXct(DateA)),         min(as.POSIXct(DateA)), 
   unit = 'sec')),.(group = letters[cumsum(c(TRUE, diff(DateA) >     thresh1))])]

However, I am doing something wrong because I am only getting the output of 1 row.

  group  Duration

   a       0

Not sure what I am doing wrong? Any suggestion is appreciated.

Upvotes: 0

Views: 39

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

We can convert DateA to POSIXct class, format it to include information only till minute precision and find the difference between max and min duration in each group.

library(dplyr)

df %>%
  mutate(DateA = lubridate::dmy_hms(DateA), 
         temp = format(DateA, "%Y-%m-%d %H:%M")) %>%
  group_by(temp) %>%
  summarise(duration = difftime(max(DateA), min(DateA), units = "secs"))

# A tibble: 4 x 2
#  temp             duration
#  <chr>            <drtn>  
#1 2019-09-09 19:52 3 secs  
#2 2019-10-09 01:02 6 secs  
#3 2019-10-09 01:03 3 secs  
#4 2019-10-09 04:18 3 secs  

Upvotes: 2

Related Questions