Reputation: 9018
I have this data frame
dat = data.frame(ID= c(1,1,1,2,3,3),
NumberInSequence= c(1,2,3,1,1,2),
StartTime = as.POSIXct(c("2016-01-01 05:52:05 GMT","2016-01-01 05:52:11 GMT","2016-01-01 05:52:16 GMT","2016-01-01 05:40:05 GMT","2016-01-01 06:12:13 GMT","2016-01-01 07:12:26 GMT")) ,
EndTime = as.POSIXct(c("2016-01-01 05:52:10 GMT","2016-01-01 05:52:16 GMT","2016-01-01 05:52:30 GMT","2016-01-01 05:46:05 GMT","2016-01-01 06:12:25 GMT","2016-01-01 08:00:00 GMT") )
)
dat
ID NumberInSequence StartTime EndTime
1 1 1 2016-01-01 05:52:05 2016-01-01 05:52:10
2 1 2 2016-01-01 05:52:11 2016-01-01 05:52:16
3 1 3 2016-01-01 05:52:16 2016-01-01 05:52:30
4 2 1 2016-01-01 05:40:05 2016-01-01 05:46:05
5 3 1 2016-01-01 06:12:13 2016-01-01 06:12:25
6 3 2 2016-01-01 07:12:26 2016-01-01 08:00:00
Each ID can have 1 or more rows and each row is in time order for every ID. I'd like to add 2 columns:
1st column: "Duration" which the length of time, in seconds, BETWEEN the EndTime of an ID to the next start time of the same id.
2nd column: "Next start time" which is the actual next start time of the same ID.
So the results should look like this:
dat$Duration = ?
dat$NextStartTime = ?
ID NumberInSequence StartTime EndTime Duration NextStartTime
1 1 1 2016-01-01 05:52:05 2016-01-01 05:52:10 1 2016-01-01 05:52:11
2 1 2 2016-01-01 05:52:11 2016-01-01 05:52:16 0 2016-01-01 05:52:16
3 1 3 2016-01-01 05:52:16 2016-01-01 05:52:30 NA NA
4 2 1 2016-01-01 05:40:05 2016-01-01 05:46:05 NA NA
5 3 1 2016-01-01 06:12:13 2016-01-01 06:12:25 3601 2016-01-01 07:12:26
6 3 2 2016-01-01 07:12:26 2016-01-01 08:00:00 NA NA
For example for ID = 3 a NUMBER IN SEQUENCE = 1 the end time is 61 seconds later so the duration is 61 and the next start time is 07:12:26 from ID = 3 and number in sequence = 2.
For rows where there is there is no next start time NA should appear like ID = 2 and number in sequence = 1.
I was looking at doing this with dplyr somehow....
######## UPDATElead() is the answer but there is a minor issue. Please see this post
using dplyr lead but with some contraints
Upvotes: 0
Views: 50
Reputation: 43334
For NextStartTime
, as long as your data is in order (use arrange(ID, NumberInSequence)
if unsure), you can use dplyr::lead
, which is like stats::lag
on a time series with a negative lag.
For Duration
, you can subtract times, but if the units can be on different units of magnitude, it's safer to use difftime
directly so you can keep consistent units.
All together:
library(dplyr)
dat %>% group_by(ID) %>%
arrange(ID, NumberInSequence) %>% # not necessary if already arranged, as here
mutate(NextStartTime = lead(StartTime),
Duration = difftime(NextStartTime, EndTime, units = 's'))
## Source: local data frame [6 x 6]
## Groups: ID [3]
##
## ID NumberInSequence StartTime EndTime NextStartTime Duration
## <dbl> <dbl> <dttm> <dttm> <dttm> <time>
## 1 1 1 2016-01-01 05:52:05 2016-01-01 05:52:10 2016-01-01 05:52:11 1 secs
## 2 1 2 2016-01-01 05:52:11 2016-01-01 05:52:16 2016-01-01 05:52:16 0 secs
## 3 1 3 2016-01-01 05:52:16 2016-01-01 05:52:30 <NA> NA secs
## 4 2 1 2016-01-01 05:40:05 2016-01-01 05:46:05 <NA> NA secs
## 5 3 1 2016-01-01 06:12:13 2016-01-01 06:12:25 2016-01-01 07:12:26 3601 secs
## 6 3 2 2016-01-01 07:12:26 2016-01-01 08:00:00 <NA> NA secs
Upvotes: 1