Reputation: 12087
I am trying to figure out how to find the closest date in 1 zoo object to a given date in another zoo object (could also use data.frame). Suppose I have:
dates.zoo <- zoo(data.frame(val=seq(1:121)), order.by = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.zoo <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))
For each date in dates.zoo
I would like to align it with the closest previous date in monthly.zoo
. (NA
if no monthly date is found). So the data.frame/zoo object I am expecting is:
...
2018-12-02 2 NA
...
2018-12-14 14 2018-12-14
2018-12-15 15 2018-12-14
2018-12-16 16 2018-12-14
...
2019-01-01 32 2018-12-14
2019-01-02 33 2019-01-02
2019-01-03 34 2019-01-02
...
NOTE: I would prefer a Base-R solution but others would be interesting to see also
Upvotes: 3
Views: 3214
Reputation: 12087
Following through on Henrik's suggestion to use findInterval
. We can do:
library(zoo)
interval.idx <- findInterval(index(dates.zoo), index(monthly.zoo))
interval.idx <- ifelse(interval.idx == 0, NA, interval.idx)
dates.zoo$month <- index(monthly.zoo)[interval.idx]
Upvotes: 4
Reputation: 28705
If, for each date in dates.df
, you want to get the closest date in monthly.df
which is less than the given date, and monthly.df
is sorted by date ascending, you can use the method below. It counts the number of rows in monthly.df
with index less than the given date, which is equivalent to the index if mothly.df
is sorted by date ascending. If there are 0 such rows, the index is changed to NA
.
inds <- rowSums(outer(index(dates.df), index(monthly.df), `>`))
inds[inds == 0] <- NA
dates.df_monthmatch <- index(monthly.df)[inds]
dates.df_monthmatch
# [1] NA NA NA NA NA NA
# [7] NA NA NA NA NA NA
# [13] NA NA "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14"
# [19] "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14"
# [25] "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14" "2018-12-14"
# [31] "2018-12-14" "2018-12-14" "2018-12-14" "2019-01-02" "2019-01-02" "2019-01-02"
# [37] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
# [43] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
# [49] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
# [55] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02"
# [61] "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-01-02" "2019-02-03"
# [67] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [73] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [79] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [85] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [91] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [97] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [103] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [109] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [115] "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03" "2019-02-03"
# [121] "2019-02-03"
Upvotes: 1
Reputation: 3000
Here is a possibility, although I did have to change the object to a data frame in order to assign the zoo index dates. This code compares the month, then year, and then finally day with criteria that it is less than or equal to the date to be matched against. If there is no date that matches this criteria then an NA is assigned. These comparisons were done with he package 'lubridate' checking for the individual date elements, and then which to logically index the best match.
library(zoo)
library(lubridate)
dates.df <- zoo(data.frame(val=seq(1:121)), order.by = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.df <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))
month_m<-month(monthly.df)
month_d<-month(dates.df)
year_m<-year(monthly.df)
year_d<-year(dates.df)
day_m<-day(monthly.df)
day_d<-day(dates.df)
index<-list()
Index<-list()
for( i in 1:length(monthly.df)){
index[[i]]<-which(month_m[i] == month_d & year_m[i] == year_d
& day_d <= day_m[i])
test<-unlist(index[[i]])
#Assigns NA if no suitable match is found
if(length(test)==0){
print("NA")
Index[[i]]=NA
}else {
Index[[i]]<-tail(test, n=1)
}
}
Test<-unlist(Index)
monthly.df_Fin<-as.data.frame(monthly.df)
dates.df_Fin<-as.data.frame(dates.df)
monthly.df_Fin$match<-as.character(row.names(dates.df_Fin)[Test])
monthly.df_Fin$value<-dates.df_Fin[Test,]
> monthly.df_Fin
val match value
2018-12-14 1 2018-12-14 14
2019-01-02 2 2019-01-02 33
2019-02-03 4 2019-02-03 65
Say we changed a value outside of the critera range:
monthly.df <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-
14'), as.Date('2019-1-2'), as.Date('2017-2-3')))
....
#Result
> monthly.df_Fin
val match value
2017-02-03 4 <NA> NA
2018-12-14 1 2018-12-14 14
2019-01-02 2 2019-01-02 33
Upvotes: 0
Reputation: 2445
A rolling join using data.table can be used. See also: https://www.r-bloggers.com/understanding-data-table-rolling-joins/
Also a solution using base-R
library(data.table)
dates.df <- data.table(val=seq(1:121), dates = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.df <- data.table(val=c(1,2,4,5), dates = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))
setkeyv(dates.df,"dates")
setkeyv(monthly.df,"dates")
#monthly.df[,nearest:=(dates)][dates.df,roll = 'nearest'] #closest date
monthly.df[,nearest:=(dates)][dates.df,roll = Inf] #Closest _previous_ date
dates.df <- zoo(data.frame(val=seq(1:121)), order.by = seq.Date(as.Date('2018-12-01'), as.Date('2019-03-31'), "days"))
monthly.df <- zoo(data.frame(val=c(1,2,4)), order.by = c(as.Date('2018-12-14'), as.Date('2019-1-2'), as.Date('2019-2-3')))
dates.df <- data.frame(val=dates.df$val,dates=attributes(dates.df)$index)
monthly.df <- data.frame(val=monthly.df$val,dates=attributes(monthly.df)$index)
min_distances <- as.numeric(dates.df$dates)- matrix(rep(as.numeric(monthly.df$dates),nrow(dates.df)),ncol=length(monthly.df$dates),byrow=T)
min_distances <- as.data.frame(t(min_distances))
closest <- sapply(min_distances,function(x)
{
w <- which(x==min(x[x>0]));
ifelse(length(w)==0,NA,w)
})
dates.df$closest_month <- monthly.df$dates[closest]
> monthly.df[,nearest:=(dates)][dates.df,roll = Inf]
val dates nearest i.val
1: NA 2018-12-01 <NA> 1
2: NA 2018-12-02 <NA> 2
3: NA 2018-12-03 <NA> 3
4: NA 2018-12-04 <NA> 4
5: NA 2018-12-05 <NA> 5
---
118: 4 2019-03-27 2019-02-03 117
119: 4 2019-03-28 2019-02-03 118
120: 4 2019-03-29 2019-02-03 119
121: 4 2019-03-30 2019-02-03 120
122: 4 2019-03-31 2019-02-03 121
> dates.df[64:69,]
val dates closest_month
2019-02-02 64 2019-02-02 2019-01-02
2019-02-03 65 2019-02-03 2019-01-02
2019-02-04 66 2019-02-04 2019-02-03
2019-02-05 67 2019-02-05 2019-02-03
2019-02-06 68 2019-02-06 2019-02-03
2019-02-07 69 2019-02-07 2019-02-03
Upvotes: 3