Reputation: 471
I have a data.table in the following format:
DT <- data.table(
id=c("123", "123", "125", "125", "123", "123", "123"),
action=c("started", "finished", "started", "finished", "started", "started", "finished"), time=c(as.POSIXct("2014-02-19 03:24:00"), as.POSIXct("2014-02-19 03:29:00"), as.POSIXct("2014-02-19 03:30:00"), as.POSIXct("2014-02-19 03:34:00"), as.POSIXct("2014-02-19 08:24:00"), as.POSIXct("2014-02-19 09:45:00"), as.POSIXct("2014-02-19 10:33:00")))
id action time 1 123 started 2014-02-19 03:24:00 2 123 finished 2014-02-19 03:29:00 3 125 started 2014-02-19 03:30:00 4 125 finished 2014-02-19 03:34:00 5 123 started 2014-02-19 08:24:00 6 123 started 2014-02-19 09:45:00 7 123 finished 2014-02-19 10:33:00
I would like to add a column that shows the time differences (action: "finished"-"started") between the rows per id. The table is sorted by time, but it is possible that there is missing data (e.g. it might happen that a "finished"-action is missing as it is the case in rows 5 and 6. In this case row 5 should be ignored and the difference between 6 and 7 is calculated. The final table should look like this.
id action time durationInMinutes 1 123 started 2014-02-19 03:24:00 NA 2 123 finished 2014-02-19 03:29:00 5 3 125 started 2014-02-19 03:30:00 NA 4 125 finished 2014-02-19 03:34:00 4 5 123 started 2014-02-19 08:24:00 NA 6 123 started 2014-02-19 09:45:00 NA 7 123 finished 2014-02-19 10:33:00 48
Is there a data.table solution for that?
Upvotes: 0
Views: 166
Reputation: 49448
DT[, duration := as.integer(time[action == "finished"] -
tail(time[action == "started"], 1))
, by = cumsum(c(0, tail(lag(id) != id, -1)))][
action == "started", duration := NA]
DT
# id action time duration
#1: 123 started 2014-02-19 03:24:00 NA
#2: 123 finished 2014-02-19 03:29:00 5
#3: 125 started 2014-02-19 03:30:00 NA
#4: 125 finished 2014-02-19 03:34:00 4
#5: 123 started 2014-02-19 08:24:00 NA
#6: 123 started 2014-02-19 09:45:00 NA
#7: 123 finished 2014-02-19 10:33:00 48
Upvotes: 2