Reputation: 45
I have a data structure similar to the one below:
# A tibble: 5 x 4
group task start end
<chr> <dbl> <chr> <chr>
1 a 1 01:00 01:30
2 a 2 02:00 02:25
3 b 3 01:05 01:40
4 b 4 01:50 02:30
5 a 5 03:00 03:30
Basically i need to compute the time difference between the end of the last task and the start of the next one - for each group - given that it needs to be following a cronological order, and belong to the same group.
Desired output:
# A tibble: 5 x 7
group last_task last_end next_task next_start next_end interval
<chr> <dbl> <chr> <dbl> <chr> <chr> <chr>
1 a NA NA 1 01:00 01:30 NA
2 a 1 01:30 2 02:00 02:25 00:30
3 b NA NA 3 01:05 01:40 NA
4 b 3 01:40 4 01:50 02:30 00:10
5 a 2 02:25 5 03:00 03:30 00:35
Upvotes: 2
Views: 65
Reputation: 24790
Here is an approach with lead
and lag
from dplyr
.
The output differs from your expected output, but I believe it matches your request in words because of grouping.
I use lubridate
since your times are actually factors. It will fail for tasks which cross dates.
library(dplyr)
library(lubridate)
data %>%
group_by(group) %>%
arrange(task) %>%
mutate(last_task = lag(task),
last_end = lag(end),
next_task = lead(task),
next_start = lead(start),
interval = ymd_hm(paste(today(),start,sep = " ")) - ymd_hm(paste(today(),lag(end),sep = " ")))
# A tibble: 5 x 9
group task start end last_task last_end next_task next_start interval
<fct> <int> <fct> <fct> <int> <fct> <int> <fct> <drtn>
1 a 1 01:00 01:30 NA NA 2 02:00 NA mins
2 a 2 02:00 02:25 1 01:30 5 03:00 30 mins
3 b 3 01:05 01:40 NA NA 4 01:50 NA mins
4 b 4 01:50 02:30 3 01:40 NA NA 10 mins
5 a 5 03:00 03:30 2 02:25 NA NA 35 mins
If you're set on the interval
format, we can hack that together:
data %>%
group_by(group) %>%
arrange(task) %>%
mutate(last_task = lag(task),
last_end = lag(end),
next_task = lead(task),
next_start = lead(start),
interval = ymd_hm(paste(today(),start,sep = " ")) - ymd_hm(paste(today(),lag(end),sep = " ")),
interval = ifelse(is.na(interval),NA,paste(hour(as.period(interval)),minute(as.period(interval)),sep = ":")))
# A tibble: 5 x 9
group task start end last_task last_end next_task next_start interval
<fct> <int> <fct> <fct> <int> <fct> <int> <fct> <chr>
1 a 1 01:00 01:30 NA NA 2 02:00 NA
2 a 2 02:00 02:25 1 01:30 5 03:00 0:30
3 b 3 01:05 01:40 NA NA 4 01:50 NA
4 b 4 01:50 02:30 3 01:40 NA NA 0:10
5 a 5 03:00 03:30 2 02:25 NA NA 0:35
Upvotes: 1