kukuk1de
kukuk1de

Reputation: 406

Analyse time difference between events in repeating sequence in R

I have a data.table ordered by DateTime like below and want to analyse the time difference between ID = 1 and the ID's 2 to 5, separated by state and the repeating sequences of the state.

structure(list(DateTime = structure(c(1533081601.42, 1533081601.98, 
                                      1533081601.98, 1533081601.98, 1533081602.02, 1533081928.06, 1533081928.22, 
                                      1533081928.22, 1533081928.22, 1533081928.22, 1533081952.34, 1533081952.94, 
                                      1533081952.94, 1533081952.94, 1533081952.98, 1533081953.98, 1533081954.14, 
                                      1533081954.14, 1533081954.14, 1533081954.14), class = c("POSIXct", 
                                                                                              "POSIXt"), tzone = "UTC"), ID = c(1, 3, 4, 5, 2, 1, 2, 3, 4, 
                                                                                                                                5, 1, 3, 4, 5, 2, 1, 2, 3, 4, 5), state = c(0L, 0L, 0L, 0L, 0L, 
                                                                                                                                                                            1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L)), class = c("data.table", 
                                                                                                                                                                                                                                                    "data.frame"))

ID 1 is always first in a sequence but the order of ID's 2 -5 after ID 1 is changing. The sequence of state(s) is just a repeating of 0 and 1.

For each run of the IDs 1 to 5 having the same state (1 or 0) I want to get the time difference between ID 1 and the other IDs.

A data.table() approach is preferred but any advice is welcome!!!

Upvotes: 0

Views: 91

Answers (2)

Uwe
Uwe

Reputation: 42544

Supposed that the data is given in the correct order,

library(data.table)
df[, time_diff := round(DateTime - first(DateTime), 3L), by = cumsum(ID == 1)][]

will return

               DateTime ID state time_diff
 1: 2018-08-01 00:00:01  1     0 0.00 secs
 2: 2018-08-01 00:00:01  3     0 0.56 secs
 3: 2018-08-01 00:00:01  4     0 0.56 secs
 4: 2018-08-01 00:00:01  5     0 0.56 secs
 5: 2018-08-01 00:00:02  2     0 0.60 secs
 6: 2018-08-01 00:05:28  1     1 0.00 secs
 7: 2018-08-01 00:05:28  2     1 0.16 secs
 8: 2018-08-01 00:05:28  3     1 0.16 secs
 9: 2018-08-01 00:05:28  4     1 0.16 secs
10: 2018-08-01 00:05:28  5     1 0.16 secs
11: 2018-08-01 00:05:52  1     0 0.00 secs
12: 2018-08-01 00:05:52  3     0 0.60 secs
13: 2018-08-01 00:05:52  4     0 0.60 secs
14: 2018-08-01 00:05:52  5     0 0.60 secs
15: 2018-08-01 00:05:52  2     0 0.64 secs
16: 2018-08-01 00:05:53  1     1 0.00 secs
17: 2018-08-01 00:05:54  2     1 0.16 secs
18: 2018-08-01 00:05:54  3     1 0.16 secs
19: 2018-08-01 00:05:54  4     1 0.16 secs
20: 2018-08-01 00:05:54  5     1 0.16 secs

The grouping variable is created on the fly with by = cumsum(ID == 1), so it is advanced every time an ID1 is encountered. Then, a new column is created by reference, i.e., without copying the whole data object.

As each group starts with ID1 we can safely use first(DateTime) instead of DateTime[ID == 1] thereby avoiding another lookup operation.

Upvotes: 3

tyumru
tyumru

Reputation: 428

A solution using a duplicate of the original table:

dt<-
structure(list(DateTime = structure(c(1533081601.42, 1533081601.98, 
                                      1533081601.98, 1533081601.98, 1533081602.02, 1533081928.06, 1533081928.22, 
                                      1533081928.22, 1533081928.22, 1533081928.22, 1533081952.34, 1533081952.94, 
                                      1533081952.94, 1533081952.94, 1533081952.98, 1533081953.98, 1533081954.14, 
                                      1533081954.14, 1533081954.14, 1533081954.14), class = c("POSIXct", 
                                                                                              "POSIXt"), tzone = "UTC"), ID = c(1, 3, 4, 5, 2, 1, 2, 3, 4, 
                                                                                                                                5, 1, 3, 4, 5, 2, 1, 2, 3, 4, 5), state = c(0L, 0L, 0L, 0L, 0L, 
                                                                                                                                                                            1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L)), class = c("data.table", 
                                                                                                                                                                                                                                                    "data.frame"))

setorder(dt,ID,DateTime)
dt[,run:=1:.N,by=ID]
dt2 = copy(dt)
dt3 = merge(dt[ID==1],dt2,by=c("state","run"),allow.cartesian=T)
dt3[,timediff:=DateTime.y -DateTime.x]
dt4 = dt3[ID.x==1 & ID.y %in% c(2:5)]
setnames(dt4,c("DateTime.y","ID.y"),c("DateTime","ID"))
result = merge(dt,dt4[,list(DateTime,ID,state,run,timediff)],by=c("DateTime","ID","state","run"),all.x=T)

Using cartesian while merging can be problematic if your data is huge though

Upvotes: 0

Related Questions