Miriam
Miriam

Reputation: 471

How to add column based on specific row differences?

I have a data.table in the following format:

DT <- data.table(
id=c("123", "123", "125", "125", "123", "123", "123"), 
action=c("started", "finished", "started", "finished", "started", "started", "finished"), time=c(as.POSIXct("2014-02-19 03:24:00"), as.POSIXct("2014-02-19 03:29:00"), as.POSIXct("2014-02-19 03:30:00"), as.POSIXct("2014-02-19 03:34:00"), as.POSIXct("2014-02-19 08:24:00"), as.POSIXct("2014-02-19 09:45:00"), as.POSIXct("2014-02-19 10:33:00")))
    id  action      time
1   123 started     2014-02-19 03:24:00
2   123 finished    2014-02-19 03:29:00
3   125 started     2014-02-19 03:30:00
4   125 finished    2014-02-19 03:34:00
5   123 started     2014-02-19 08:24:00
6   123 started     2014-02-19 09:45:00
7   123 finished    2014-02-19 10:33:00

I would like to add a column that shows the time differences (action: "finished"-"started") between the rows per id. The table is sorted by time, but it is possible that there is missing data (e.g. it might happen that a "finished"-action is missing as it is the case in rows 5 and 6. In this case row 5 should be ignored and the difference between 6 and 7 is calculated. The final table should look like this.

    id  action      time                   durationInMinutes
1   123 started     2014-02-19 03:24:00    NA
2   123 finished    2014-02-19 03:29:00    5
3   125 started     2014-02-19 03:30:00    NA
4   125 finished    2014-02-19 03:34:00    4
5   123 started     2014-02-19 08:24:00    NA
6   123 started     2014-02-19 09:45:00    NA
7   123 finished    2014-02-19 10:33:00    48

Is there a data.table solution for that?

Upvotes: 0

Views: 166

Answers (1)

eddi
eddi

Reputation: 49448

DT[, duration := as.integer(time[action == "finished"] -
                            tail(time[action == "started"], 1))
   , by = cumsum(c(0, tail(lag(id) != id, -1)))][
     action == "started", duration := NA]
DT
#    id   action                time duration
#1: 123  started 2014-02-19 03:24:00       NA
#2: 123 finished 2014-02-19 03:29:00        5
#3: 125  started 2014-02-19 03:30:00       NA
#4: 125 finished 2014-02-19 03:34:00        4
#5: 123  started 2014-02-19 08:24:00       NA
#6: 123  started 2014-02-19 09:45:00       NA
#7: 123 finished 2014-02-19 10:33:00       48

Upvotes: 2

Related Questions