Reputation: 12087

How do I find the closest date to a given date?

I am trying to figure out how to find the closest date in 1 zoo object to a given date in another zoo object (could also use data.frame). Suppose I have:

dates.zoo <- zoo(data.frame(val=seq(1:121)), order.by = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.zoo <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))

For each date in dates.zoo I would like to align it with the closest previous date in monthly.zoo. (NA if no monthly date is found). So the data.frame/zoo object I am expecting is:

...
2018-12-02   2  NA
...
2018-12-14  14  2018-12-14
2018-12-15  15  2018-12-14
2018-12-16  16  2018-12-14
...
2019-01-01  32  2018-12-14
2019-01-02  33  2019-01-02
2019-01-03  34  2019-01-02
...

NOTE: I would prefer a Base-R solution but others would be interesting to see also

Upvotes: 3

Answers (4)

Denis

Reputation: 12087

Following through on Henrik's suggestion to use findInterval. We can do:

library(zoo)
interval.idx <- findInterval(index(dates.zoo), index(monthly.zoo))
interval.idx <- ifelse(interval.idx == 0, NA, interval.idx)
dates.zoo$month <- index(monthly.zoo)[interval.idx]

Upvotes: 4

IceCreamToucan

Reputation: 28705

If, for each date in dates.df, you want to get the closest date in monthly.df which is less than the given date, and monthly.df is sorted by date ascending, you can use the method below. It counts the number of rows in monthly.df with index less than the given date, which is equivalent to the index if mothly.df is sorted by date ascending. If there are 0 such rows, the index is changed to NA.

inds <- rowSums(outer(index(dates.df), index(monthly.df), `>`))
inds[inds == 0] <- NA
dates.df_monthmatch <- index(monthly.df)[inds]


dates.df_monthmatch
#   [1] NA           NA           NA           NA           NA           NA          
#   [7] NA           NA           NA           NA           NA           NA          
#  [13] NA           NA           "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14"
#  [19] "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14"
#  [25] "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14"
#  [31] "2018-12-14" "2018-12-14" "2018-12-14" "2019-01-02" "2019-01-02" "2019-01-02"
#  [37] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
#  [43] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
#  [49] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
#  [55] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
#  [61] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-02-03"
#  [67] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
#  [73] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
#  [79] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
#  [85] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
#  [91] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
#  [97] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [103] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [109] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [115] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [121] "2019-02-03"

Upvotes: 1

Chabo

Reputation: 3000

Here is a possibility, although I did have to change the object to a data frame in order to assign the zoo index dates. This code compares the month, then year, and then finally day with criteria that it is less than or equal to the date to be matched against. If there is no date that matches this criteria then an NA is assigned. These comparisons were done with he package 'lubridate' checking for the individual date elements, and then which to logically index the best match.

library(zoo)
library(lubridate)

dates.df <- zoo(data.frame(val=seq(1:121)), order.by = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.df <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))

month_m<-month(monthly.df)
month_d<-month(dates.df)

year_m<-year(monthly.df)
year_d<-year(dates.df)

day_m<-day(monthly.df)
day_d<-day(dates.df)

index<-list()
Index<-list()

for( i in 1:length(monthly.df)){

index[[i]]<-which(month_m[i] == month_d & year_m[i] == year_d
                  & day_d <= day_m[i])

test<-unlist(index[[i]])

   #Assigns NA if no suitable match is found
   if(length(test)==0){
    print("NA")
    Index[[i]]=NA
    }else {
    Index[[i]]<-tail(test, n=1)
    }                      
}

Test<-unlist(Index)
monthly.df_Fin<-as.data.frame(monthly.df)
dates.df_Fin<-as.data.frame(dates.df)
monthly.df_Fin$match<-as.character(row.names(dates.df_Fin)[Test])
monthly.df_Fin$value<-dates.df_Fin[Test,]

> monthly.df_Fin
           val      match value
2018-12-14   1 2018-12-14    14
2019-01-02   2 2019-01-02    33
2019-02-03   4 2019-02-03    65

Say we changed a value outside of the critera range:

monthly.df <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12- 
14'), as.Date('2019-1-2'), as.Date('2017-2-3')))

....

#Result
> monthly.df_Fin
           val      match value
2017-02-03   4       <NA>    NA
2018-12-14   1 2018-12-14    14
2019-01-02   2 2019-01-02    33

Upvotes: 0

Soren

Reputation: 2445

A rolling join using data.table can be used. See also: https://www.r-bloggers.com/understanding-data-table-rolling-joins/

Also a solution using base-R

data.table solution

library(data.table)
dates.df <- data.table(val=seq(1:121), dates = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.df <- data.table(val=c(1,2,4,5), dates = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))

setkeyv(dates.df,"dates")
setkeyv(monthly.df,"dates")

#monthly.df[,nearest:=(dates)][dates.df,roll = 'nearest'] #closest date
monthly.df[,nearest:=(dates)][dates.df,roll = Inf] #Closest _previous_ date

base R solution

dates.df <- zoo(data.frame(val=seq(1:121)), order.by = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.df <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))

dates.df <- data.frame(val=dates.df$val,dates=attributes(dates.df)$index)
monthly.df <- data.frame(val=monthly.df$val,dates=attributes(monthly.df)$index)

min_distances <- as.numeric(dates.df$dates)- matrix(rep(as.numeric(monthly.df$dates),nrow(dates.df)),ncol=length(monthly.df$dates),byrow=T)
min_distances <- as.data.frame(t(min_distances))

closest <- sapply(min_distances,function(x) 
  { 
    w <- which(x==min(x[x>0])); 
    ifelse(length(w)==0,NA,w) 
  })

dates.df$closest_month <- monthly.df$dates[closest]

Results: data.table

> monthly.df[,nearest:=(dates)][dates.df,roll = Inf]
     val      dates    nearest i.val
  1:  NA 2018-12-01       <NA>     1
  2:  NA 2018-12-02       <NA>     2
  3:  NA 2018-12-03       <NA>     3
  4:  NA 2018-12-04       <NA>     4
  5:  NA 2018-12-05       <NA>     5
 ---                                
118:   4 2019-03-27 2019-02-03   117
119:   4 2019-03-28 2019-02-03   118
120:   4 2019-03-29 2019-02-03   119
121:   4 2019-03-30 2019-02-03   120
122:   4 2019-03-31 2019-02-03   121

Results base R

> dates.df[64:69,]
           val      dates closest_month
2019-02-02  64 2019-02-02    2019-01-02
2019-02-03  65 2019-02-03    2019-01-02
2019-02-04  66 2019-02-04    2019-02-03
2019-02-05  67 2019-02-05    2019-02-03
2019-02-06  68 2019-02-06    2019-02-03
2019-02-07  69 2019-02-07    2019-02-03