Vint
Vint

Reputation: 499

Matching values by nearest POSIXct in two data frames keeping both dates

I have two data frames that I would like to match by nearest date, creating a new data frame with values from both (including the "nearest date match"). And it is this last part concerning including the nearest date match that has been causing me problems.

#Data sets

date1= as.POSIXct(c("2013-08-03", "2013-09-04", "2013-09-08", "2013-09-12", "2013-11-01"))
date2= as.POSIXct(c("2013-08-06", "2013-09-20", "2013-09-12", "2013-09-14", "2014-11-01","2010-11-01","2014-06-01"))

Num1<-seq(1,5)
Num2<-seq(1,7)


DF1<-data.frame(date1,Num1)
DF2<-data.frame(date2,Num2)

This for loop gives me what I want, but takes way too long to run on my dataset.

for(i in seq(1,nrow(DF1))){


  Data<-DF1[i,]

  #Date
  DataDate<-Data$date1

  MinTime<-which.min(abs(DataDate-DF2$date2))
  Data2<-DF2[MinTime,]
  DF1[i,'NearestNum2']<-Data2[,"Num2"]
  DF1[i,'NearestDate2']<-Data2[,"date2"]
  DF1[i,'TimeDiff']<-as.numeric(difftime(DF1[i,'date1'], DF1[i,'NearestDate2'], units='secs'))

}

DF1

I have found several excellent answers to similar problems using the data.table package. And this is what I tired based on previous answers:

#This line should provide with me with the numerical value associated with the nearest date
    setDT(DF1)[, NearestNum2 := setDT(DF2)[DF1, Num2, on=c(date2="date1"), roll="nearest"]]

#This line "should provide me with the actual nearest date used, but does not
    setDT(DF1)[, NearestDate1 := setDT(DF2)[DF1, date2, on=c(date2="date1"), roll="nearest"]]

The problem here is that (in contrast to the for loop result), the "nearestDate2" column using the data.table code is just a repeat of the original date. Can someone provide a data.table (or any efficient bit of code) that can provide me with the same output of the above for loop?

Upvotes: 1

Views: 226

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269471

Here are some approaches to match each row in DF1 to the closest row in DF2 by date.

1) sqldf

library(sqldf)
sqldf("select *, min(abs(date2 - date1)) AbsDiff, date2 - date1 TimeDiff
       from DF1 a join DF2 b group by a.rowid")

giving:

       date1 Num1      date2 Num2 AbsDiff TimeDiff
1 2013-08-03    1 2013-08-06    1  259200   259200
2 2013-09-04    2 2013-09-12    3  691200   691200
3 2013-09-08    3 2013-09-12    3  345600   345600
4 2013-09-12    4 2013-09-12    3       0        0
5 2013-11-01    5 2013-09-20    2 3628800 -3628800

2) Base R

abs.min <- function(d1) which.min(abs(d1 - DF2$date2))
transform(cbind(DF1, DF2[sapply(DF1$date1, abs.min), ]), 
          TimeDiff = as.numeric(date2 - date1))

giving:

         date1 Num1      date2 Num2 TimeDiff
1   2013-08-03    1 2013-08-06    1   259200
3   2013-09-04    2 2013-09-12    3   691200
3.1 2013-09-08    3 2013-09-12    3   345600
3.2 2013-09-12    4 2013-09-12    3        0
2   2013-11-01    5 2013-09-20    2 -3628800

Upvotes: 2

s_baldur
s_baldur

Reputation: 33498

Not sure why your original solution isn't working but here is an alternative setup that does the job:

setDT(DF1)
setDT(DF2)

DF1[, c("NearestNum2", "NearestDate2") := 
        DF2[.SD, on = .(date2 = date1), .(x.Num2, x.date2), roll = "nearest"]
    ][,  TimeDiff := as.numeric(difftime(date1, NearestDate2))]

DF1

        date1 Num1 NearestNum2 NearestDate2 TimeDiff
1: 2013-08-03    1           1   2013-08-06  -259200
2: 2013-09-04    2           3   2013-09-12  -691200
3: 2013-09-08    3           3   2013-09-12  -345600
4: 2013-09-12    4           3   2013-09-12        0
5: 2013-11-01    5           2   2013-09-20  3628800

Upvotes: 3

Related Questions