Reputation: 499
I have two data frames that I would like to match by nearest date, creating a new data frame with values from both (including the "nearest date match"). And it is this last part concerning including the nearest date match that has been causing me problems.
#Data sets
date1= as.POSIXct(c("2013-08-03", "2013-09-04", "2013-09-08", "2013-09-12", "2013-11-01"))
date2= as.POSIXct(c("2013-08-06", "2013-09-20", "2013-09-12", "2013-09-14", "2014-11-01","2010-11-01","2014-06-01"))
Num1<-seq(1,5)
Num2<-seq(1,7)
DF1<-data.frame(date1,Num1)
DF2<-data.frame(date2,Num2)
This for loop gives me what I want, but takes way too long to run on my dataset.
for(i in seq(1,nrow(DF1))){
Data<-DF1[i,]
#Date
DataDate<-Data$date1
MinTime<-which.min(abs(DataDate-DF2$date2))
Data2<-DF2[MinTime,]
DF1[i,'NearestNum2']<-Data2[,"Num2"]
DF1[i,'NearestDate2']<-Data2[,"date2"]
DF1[i,'TimeDiff']<-as.numeric(difftime(DF1[i,'date1'], DF1[i,'NearestDate2'], units='secs'))
}
DF1
I have found several excellent answers to similar problems using the data.table package. And this is what I tired based on previous answers:
#This line should provide with me with the numerical value associated with the nearest date
setDT(DF1)[, NearestNum2 := setDT(DF2)[DF1, Num2, on=c(date2="date1"), roll="nearest"]]
#This line "should provide me with the actual nearest date used, but does not
setDT(DF1)[, NearestDate1 := setDT(DF2)[DF1, date2, on=c(date2="date1"), roll="nearest"]]
The problem here is that (in contrast to the for loop result), the "nearestDate2" column using the data.table code is just a repeat of the original date. Can someone provide a data.table (or any efficient bit of code) that can provide me with the same output of the above for loop?
Upvotes: 1
Views: 226
Reputation: 269471
Here are some approaches to match each row in DF1 to the closest row in DF2 by date.
1) sqldf
library(sqldf)
sqldf("select *, min(abs(date2 - date1)) AbsDiff, date2 - date1 TimeDiff
from DF1 a join DF2 b group by a.rowid")
giving:
date1 Num1 date2 Num2 AbsDiff TimeDiff
1 2013-08-03 1 2013-08-06 1 259200 259200
2 2013-09-04 2 2013-09-12 3 691200 691200
3 2013-09-08 3 2013-09-12 3 345600 345600
4 2013-09-12 4 2013-09-12 3 0 0
5 2013-11-01 5 2013-09-20 2 3628800 -3628800
2) Base R
abs.min <- function(d1) which.min(abs(d1 - DF2$date2))
transform(cbind(DF1, DF2[sapply(DF1$date1, abs.min), ]),
TimeDiff = as.numeric(date2 - date1))
giving:
date1 Num1 date2 Num2 TimeDiff
1 2013-08-03 1 2013-08-06 1 259200
3 2013-09-04 2 2013-09-12 3 691200
3.1 2013-09-08 3 2013-09-12 3 345600
3.2 2013-09-12 4 2013-09-12 3 0
2 2013-11-01 5 2013-09-20 2 -3628800
Upvotes: 2
Reputation: 33498
Not sure why your original solution isn't working but here is an alternative setup that does the job:
setDT(DF1)
setDT(DF2)
DF1[, c("NearestNum2", "NearestDate2") :=
DF2[.SD, on = .(date2 = date1), .(x.Num2, x.date2), roll = "nearest"]
][, TimeDiff := as.numeric(difftime(date1, NearestDate2))]
DF1
date1 Num1 NearestNum2 NearestDate2 TimeDiff
1: 2013-08-03 1 1 2013-08-06 -259200
2: 2013-09-04 2 3 2013-09-12 -691200
3: 2013-09-08 3 3 2013-09-12 -345600
4: 2013-09-12 4 3 2013-09-12 0
5: 2013-11-01 5 2 2013-09-20 3628800
Upvotes: 3