Reputation: 1752
I have two data.tables:
original <- data.frame(id = c(rep("RE01",5),rep("RE02",5)),date.time = head(seq.POSIXt(as.POSIXct("2015-11-01 01:00:00"),as.POSIXct("2015-11-05 01:00:00"),60*60*10),10))
compare <- data.frame(id = c("RE01","RE02"),seq = c(1,2),start = as.POSIXct(c("2015-11-01 20:00:00","2015-11-04 08:00:00")),end = as.POSIXct(c("2015-11-02 08:00:00","2015-11-04 20:00:00")))
setDT(original)
setDT(compare)
I would like to check the date in each row of original
and see if it lies between the start and finish dates of compare
whilst respecting the id
. If it does lie between the two elements, a variable should be passed to original
(compare$diff.seq
). The output should look like this:
original
id date.time diff.seq
1 RE01 2015-11-01 01:00:00 NA
2 RE01 2015-11-01 11:00:00 NA
3 RE01 2015-11-01 21:00:00 1
4 RE01 2015-11-02 07:00:00 1
5 RE01 2015-11-02 17:00:00 NA
6 RE02 2015-11-03 03:00:00 NA
7 RE02 2015-11-03 13:00:00 NA
8 RE02 2015-11-03 23:00:00 NA
9 RE02 2015-11-04 09:00:00 2
10 RE02 2015-11-04 19:00:00 2
I've been reading the manual and SO for hours and trying "on", "by" and so on.. without any success. Can anybody point me in the right direction?
Upvotes: 1
Views: 69
Reputation: 92310
As said in the comments, this is very straight forward using data.table::foverlaps
You basically have to create an additional column in the original
data set in order to set join boundaries, then key the two data sets by the columns you want to join on and then simply run forverlas
and select the desired columns
original[, end := date.time]
setkey(original, id, date.time, end)
setkey(compare, id, start, end)
foverlaps(original, compare)[, .(id, date.time, seq)]
# id date.time seq
# 1: RE01 2015-11-01 01:00:00 NA
# 2: RE01 2015-11-01 11:00:00 NA
# 3: RE01 2015-11-01 21:00:00 1
# 4: RE01 2015-11-02 07:00:00 1
# 5: RE01 2015-11-02 17:00:00 NA
# 6: RE02 2015-11-03 03:00:00 NA
# 7: RE02 2015-11-03 13:00:00 NA
# 8: RE02 2015-11-03 23:00:00 NA
# 9: RE02 2015-11-04 09:00:00 2
# 10: RE02 2015-11-04 19:00:00 2
Alternatively, you can run foverlaps
the other way around and then just update the original
data set by reference while selecting the correct rows to update
indx <- foverlaps(compare, original, which = TRUE)
original[indx$yid, diff.seq := indx$xid]
original
# id date.time end diff.seq
# 1: RE01 2015-11-01 01:00:00 2015-11-01 01:00:00 NA
# 2: RE01 2015-11-01 11:00:00 2015-11-01 11:00:00 NA
# 3: RE01 2015-11-01 21:00:00 2015-11-01 21:00:00 1
# 4: RE01 2015-11-02 07:00:00 2015-11-02 07:00:00 1
# 5: RE01 2015-11-02 17:00:00 2015-11-02 17:00:00 NA
# 6: RE02 2015-11-03 03:00:00 2015-11-03 03:00:00 NA
# 7: RE02 2015-11-03 13:00:00 2015-11-03 13:00:00 NA
# 8: RE02 2015-11-03 23:00:00 2015-11-03 23:00:00 NA
# 9: RE02 2015-11-04 09:00:00 2015-11-04 09:00:00 2
# 10: RE02 2015-11-04 19:00:00 2015-11-04 19:00:00 2
Upvotes: 2