merge data tables by time intervals overlap

Question

Suppose I have two tables. One with appointments and second with receptions. Each table has filial ID, medic ID, start and end time (plan for appointments and fact for receptions) and some other data. I want to count how much of appointments have receptions inside time interval of appointment period. Reception fact can begin before appointment start time, after, it can be inside app. interval, etc.

Below I made two tables. One for appointments and one for receptions. I wrote nested loop but it works very slow. My tables contains approximately 50 mio rows each. I need fast solution for this problem. How can I do this without loop? Thanks in advance!

library(data.table)

date <- as.POSIXct('2015-01-01 14:30:00')

# appointments data table
app <- data.table(med.id = 1:10,
                  filial.id = rep(c(100,200), each = 5),
                  start.time = rep(seq(date, length.out = 5, by = "hours"),2),
                  end.time = rep(seq(date+3599, length.out = 5, by = "hours"),2),
                  A = rnorm(10))


# receptions data table
re <- data.table(med.id = c(1,11,3,4,15,6,7),
                 filial.id = c(rep(100, 5), 200,200),
                 start.time = as.POSIXct(paste(rep('2015-01-01 ',7), c('14:25:00', '14:25:00','16:32:00', '17:25:00', '16:10:00', '15:35:00','15:50:00'))),
                 end.time = as.POSIXct(paste(rep('2015-01-01 ',7), c('15:25:00', '15:20:00','17:36:00', '18:40:00', '16:10:00', '15:49:00','16:12:00'))),
                 B = rnorm(7))



app$count <- 0

for (i in 1:dim(app)[1]){
  for (j in 1:dim(re)[1]){
    if ((app$med.id[i] == re$med.id[j]) & # med.id is equal and
        app$filial.id[i] == re$filial.id[j]) { # filial.id is equal
      if ((re$start.time[j] < app$start.time[i]) & (re$end.time[j] > app$start.time[i])) { # reception starts before appointment start time and ends after appointment start time OR 
        app$count[i] <- app$count[i] + 1
      } else if ((re$start.time[j] < app$end.time[i]) & (re$start.time[j] > app$start.time[i])) { # reception starts before appointment end time and after app. start time
        app$count[i] <- app$count[i] + 1
      }
    }
  }
}

Arun · Accepted Answer

Using foverlaps():

setkey(re, med.id, filial.id, start.time, end.time)
olaps = foverlaps(app, re, which=TRUE, nomatch=0L)[, .N, by=xid]
app[, count := 0L][olaps$xid, count := olaps$N]
app
#     med.id filial.id          start.time            end.time           A count
#  1:      1       100 2015-01-01 14:30:00 2015-01-01 15:29:59  0.60878560     1
#  2:      2       100 2015-01-01 15:30:00 2015-01-01 16:29:59 -0.11545284     0
#  3:      3       100 2015-01-01 16:30:00 2015-01-01 17:29:59  0.68992084     1
#  4:      4       100 2015-01-01 17:30:00 2015-01-01 18:29:59  0.04703938     1
#  5:      5       100 2015-01-01 18:30:00 2015-01-01 19:29:59 -0.95315419     0
#  6:      6       200 2015-01-01 14:30:00 2015-01-01 15:29:59  0.26193554     0
#  7:      7       200 2015-01-01 15:30:00 2015-01-01 16:29:59  1.55206077     1
#  8:      8       200 2015-01-01 16:30:00 2015-01-01 17:29:59  0.44517362     0
#  9:      9       200 2015-01-01 17:30:00 2015-01-01 18:29:59  0.11475881     0
# 10:     10       200 2015-01-01 18:30:00 2015-01-01 19:29:59 -0.66139828     0

PS: please go through the vignettes and learn to use data tables effectively.

merge data tables by time intervals overlap

Answers (2)

Related Questions