Guilherme Campos
Guilherme Campos

Reputation: 45

Time intervals from data across multiple rows

I have a data structure similar to the one below:

# A tibble: 5 x 4
  group  task start end  
  <chr> <dbl> <chr> <chr>
1 a         1 01:00 01:30
2 a         2 02:00 02:25
3 b         3 01:05 01:40
4 b         4 01:50 02:30
5 a         5 03:00 03:30

Basically i need to compute the time difference between the end of the last task and the start of the next one - for each group - given that it needs to be following a cronological order, and belong to the same group.

Desired output:

# A tibble: 5 x 7
  group last_task last_end next_task next_start next_end interval
  <chr>     <dbl> <chr>        <dbl> <chr>      <chr>    <chr>   
1 a            NA NA               1 01:00      01:30    NA      
2 a             1 01:30            2 02:00      02:25    00:30   
3 b            NA NA               3 01:05      01:40    NA      
4 b             3 01:40            4 01:50      02:30    00:10   
5 a             2 02:25            5 03:00      03:30    00:35   

Upvotes: 2

Views: 65

Answers (1)

Ian Campbell
Ian Campbell

Reputation: 24790

Here is an approach with lead and lag from dplyr.

The output differs from your expected output, but I believe it matches your request in words because of grouping.

I use lubridate since your times are actually factors. It will fail for tasks which cross dates.

library(dplyr)
library(lubridate)
data %>%
  group_by(group) %>%
  arrange(task) %>%
  mutate(last_task = lag(task),
         last_end = lag(end),
         next_task = lead(task),
         next_start = lead(start),
         interval = ymd_hm(paste(today(),start,sep = " ")) - ymd_hm(paste(today(),lag(end),sep = " ")))
# A tibble: 5 x 9
  group  task start end   last_task last_end next_task next_start interval
  <fct> <int> <fct> <fct>     <int> <fct>        <int> <fct>      <drtn>  
1 a         1 01:00 01:30        NA NA               2 02:00      NA mins 
2 a         2 02:00 02:25         1 01:30            5 03:00      30 mins 
3 b         3 01:05 01:40        NA NA               4 01:50      NA mins 
4 b         4 01:50 02:30         3 01:40           NA NA         10 mins 
5 a         5 03:00 03:30         2 02:25           NA NA         35 mins 

If you're set on the interval format, we can hack that together:

data %>%
  group_by(group) %>%
  arrange(task) %>%
  mutate(last_task = lag(task),
         last_end = lag(end),
         next_task = lead(task),
         next_start = lead(start),
         interval = ymd_hm(paste(today(),start,sep = " ")) - ymd_hm(paste(today(),lag(end),sep = " ")),
         interval = ifelse(is.na(interval),NA,paste(hour(as.period(interval)),minute(as.period(interval)),sep = ":")))
# A tibble: 5 x 9
  group  task start end   last_task last_end next_task next_start interval
  <fct> <int> <fct> <fct>     <int> <fct>        <int> <fct>      <chr>   
1 a         1 01:00 01:30        NA NA               2 02:00      NA      
2 a         2 02:00 02:25         1 01:30            5 03:00      0:30    
3 b         3 01:05 01:40        NA NA               4 01:50      NA      
4 b         4 01:50 02:30         3 01:40           NA NA         0:10    
5 a         5 03:00 03:30         2 02:25           NA NA         0:35   

Upvotes: 1

Related Questions