Reputation: 406
I have a data.table
ordered by DateTime like below and want to analyse the time difference between ID = 1 and the ID's 2 to 5, separated by state and the repeating sequences of the state.
structure(list(DateTime = structure(c(1533081601.42, 1533081601.98,
1533081601.98, 1533081601.98, 1533081602.02, 1533081928.06, 1533081928.22,
1533081928.22, 1533081928.22, 1533081928.22, 1533081952.34, 1533081952.94,
1533081952.94, 1533081952.94, 1533081952.98, 1533081953.98, 1533081954.14,
1533081954.14, 1533081954.14, 1533081954.14), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), ID = c(1, 3, 4, 5, 2, 1, 2, 3, 4,
5, 1, 3, 4, 5, 2, 1, 2, 3, 4, 5), state = c(0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L)), class = c("data.table",
"data.frame"))
ID 1 is always first in a sequence but the order of ID's 2 -5 after ID 1 is changing. The sequence of state(s) is just a repeating of 0 and 1.
For each run of the IDs 1 to 5 having the same state (1 or 0) I want to get the time difference between ID 1 and the other IDs.
A data.table()
approach is preferred but any advice is welcome!!!
Upvotes: 0
Views: 91
Reputation: 42544
Supposed that the data is given in the correct order,
library(data.table)
df[, time_diff := round(DateTime - first(DateTime), 3L), by = cumsum(ID == 1)][]
will return
DateTime ID state time_diff 1: 2018-08-01 00:00:01 1 0 0.00 secs 2: 2018-08-01 00:00:01 3 0 0.56 secs 3: 2018-08-01 00:00:01 4 0 0.56 secs 4: 2018-08-01 00:00:01 5 0 0.56 secs 5: 2018-08-01 00:00:02 2 0 0.60 secs 6: 2018-08-01 00:05:28 1 1 0.00 secs 7: 2018-08-01 00:05:28 2 1 0.16 secs 8: 2018-08-01 00:05:28 3 1 0.16 secs 9: 2018-08-01 00:05:28 4 1 0.16 secs 10: 2018-08-01 00:05:28 5 1 0.16 secs 11: 2018-08-01 00:05:52 1 0 0.00 secs 12: 2018-08-01 00:05:52 3 0 0.60 secs 13: 2018-08-01 00:05:52 4 0 0.60 secs 14: 2018-08-01 00:05:52 5 0 0.60 secs 15: 2018-08-01 00:05:52 2 0 0.64 secs 16: 2018-08-01 00:05:53 1 1 0.00 secs 17: 2018-08-01 00:05:54 2 1 0.16 secs 18: 2018-08-01 00:05:54 3 1 0.16 secs 19: 2018-08-01 00:05:54 4 1 0.16 secs 20: 2018-08-01 00:05:54 5 1 0.16 secs
The grouping variable is created on the fly with by = cumsum(ID == 1)
, so it is advanced every time an ID1 is encountered. Then, a new column is created by reference, i.e., without copying the whole data object.
As each group starts with ID1 we can safely use first(DateTime)
instead of
DateTime[ID == 1]
thereby avoiding another lookup operation.
Upvotes: 3
Reputation: 428
A solution using a duplicate of the original table:
dt<-
structure(list(DateTime = structure(c(1533081601.42, 1533081601.98,
1533081601.98, 1533081601.98, 1533081602.02, 1533081928.06, 1533081928.22,
1533081928.22, 1533081928.22, 1533081928.22, 1533081952.34, 1533081952.94,
1533081952.94, 1533081952.94, 1533081952.98, 1533081953.98, 1533081954.14,
1533081954.14, 1533081954.14, 1533081954.14), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), ID = c(1, 3, 4, 5, 2, 1, 2, 3, 4,
5, 1, 3, 4, 5, 2, 1, 2, 3, 4, 5), state = c(0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L)), class = c("data.table",
"data.frame"))
setorder(dt,ID,DateTime)
dt[,run:=1:.N,by=ID]
dt2 = copy(dt)
dt3 = merge(dt[ID==1],dt2,by=c("state","run"),allow.cartesian=T)
dt3[,timediff:=DateTime.y -DateTime.x]
dt4 = dt3[ID.x==1 & ID.y %in% c(2:5)]
setnames(dt4,c("DateTime.y","ID.y"),c("DateTime","ID"))
result = merge(dt,dt4[,list(DateTime,ID,state,run,timediff)],by=c("DateTime","ID","state","run"),all.x=T)
Using cartesian while merging can be problematic if your data is huge though
Upvotes: 0