ricardo
ricardo

Reputation: 8425

Vectorise find closest date function

I would like to pass in a vector of dates, and have returned the closest date from a second vector of (partially matching) dates.

The following function does what I require for a single date, however i cannot figure out how to generalise this to the case where searchDate is a vector of dates.

closestDate <- function(searchDate, dateList, roundDown=FALSE){
  if (roundDown) {
    dist2date <- as.Date(dateList) - as.Date(searchDate)
    closest <- which(max(dist2date[dist2date<=0]) == dist2date)
  } else {
    dist2date <- as.Date(dateList) - as.Date(searchDate)
    closest <- which(min(dist2date[dist2date>=0]) == dist2date)
  }
  return(dateList[closest])
}

dateSeq <- seq(as.Date("2011-01-01"), as.Date("2012-12-19"), by='day')
oddDates <- dateSeq[as.logical(1:length(dateSeq) %%2)]

closestDate('2012-12-14', oddDates)
[1] "2012-12-15"

miscDatesLong <- rep(c('2012-12-14', '2012-12-16', '2012-12-18'), 100 )
closestDate(miscDatesLong, oddDates)

closestDate(miscDatesLong, oddDates)
[1] "2012-12-15" "2012-12-17" "2012-12-19"
Warning message:
In unclass(time1) - unclass(time2) :
  longer object length is not a multiple of shorter object length

Could someone please help?

Upvotes: 5

Views: 2563

Answers (6)

Greg Snow
Greg Snow

Reputation: 49640

The findInterval function can do this quickly:

dateSeq <- seq(as.Date("2011-01-01"), as.Date("2012-12-19"), by = 'day')
oddDates <- dateSeq[as.logical(1:length(dateSeq) %% 2)]

oddDates[findInterval(as.Date('2012-12-14'), oddDates) + 1]

miscDatesLong <- rep(c('2012-12-14', '2012-12-16', '2012-12-18'), 100)

oddDates[findInterval(as.Date(miscDatesLong), oddDates) + 1]

To round down instead of up remove the + 1. If you really want to find the closest date, not the one just before or after you can create a new list of dates that are the midpoints of the intervals

as.Date(rowMeans(embed(as.numeric(oddDates), 2)), '1960-01-01')

and apply findInterval() on those. See the arguments to findInterval() for other options.

Upvotes: 6

IRTFM
IRTFM

Reputation: 263362

Now, with the example, just work on the subset of dates that are less than in one case or greater than in the other case, the particular target being examined at the time.

closestDt <- function(searchDate, dateList, roundDown=FALSE) 
     as.Date( sapply( searchDate , function (x) if( roundDown ){ 
                max( dateList[ dateList <= x ] ) } else {
                min( dateList[ dateList >= x])  } 
           ), "1970-01-01")

Upvotes: 3

Theodore Lytras
Theodore Lytras

Reputation: 3965

I think this is what you want:

closestDate <- function(searchDate, dateList, roundDown=FALSE) {
  as.Date(sapply(as.Date(searchDate), function(x){
    dist <- abs(x - as.Date(dateList))
    closest <- dateList[which(min(dist) == dist)]
    return(ifelse(roundDown, min(closest), max(closest)))
  }), origin="1970-1-1")
}

sapply is your friend. You just have to make sure that a date is returned instead of an integer.

Upvotes: 2

James
James

Reputation: 66844

You can use cut:

nearestDate <- function(dates,datesToMatch)
{
        dtm <- sort(datesToMatch)
        dtmMid <- dtm[-length(dtm)]+diff(dtm)/2
        as.Date(cut(dates,
        breaks=c(as.Date("1970-01-01"),
        dtmMid,as.Date("2100-01-01")),labels=dtm))
}

dates1 <- as.Date(c("2012-02-14","2012-06-23","2012-08-27","2012-12-01"))
dates2 <- as.Date(c("2012-04-01","2012-10-31","2012-12-25"))
nearestDate(dates1,dates2)
[1] "2012-04-01" "2012-04-01" "2012-10-31" "2012-12-25"

Note that I've had to pick some magic dates for the end points in the cut function since it does not accept +/-Inf. Amend as appropriate for you use.

Upvotes: 2

Spacedman
Spacedman

Reputation: 94202

?Vectorize

> closestDateV = Vectorize(closestDate,"searchDate")
> closestDateV(c('2012-12-15','2012-12-14'), oddDates)
2012-12-15 2012-12-14 
     15689      15689 

The returned values have had their date-ness removed. So add it back:

> as.Date(closestDateV(c('2012-12-15','2012-12-14'), oddDates),origin="1970-01-01")
  2012-12-15   2012-12-14 
"2012-12-15" "2012-12-15" 

You might want to wrap that all up in a new function.

Functional programming is fun!

Upvotes: 4

Anthony Damico
Anthony Damico

Reputation: 6104

# initiate a tie-breaking function
tie.breaker <-
    function( x , y , la = look.after ){

        # if look.after is TRUE, eliminate all values below x
        # otherwise, eliminate all values above x
        if ( la ) y[ y < x ] <- NA else y[ y > x ] <- NA

        # then among the remaining values, figure out the date the shortest distance away
        z <- which.min( abs( x - y ) )[1]
        # use [1] to just take the first result, in case y contains duplicate dates

        # return z
        return( z )
    }

# initiate your main function
closestDate <- 
    function( searchDate , dateList , look.after = FALSE ){

        # apply a which.min( abs( ) ) command to each of the dates given, 
        # across every date in the larger list
        dist2date <- 
            sapply( 

                # on every element of searchDate..
                as.Date( searchDate ) ,

                # ..run the tie.breaker() function
                tie.breaker , 

                # and each time, pass in the dateList
                as.Date( dateList ) ,

                # and also the look.after TRUE/FALSE flag
                look.after
            )

        # return the matching dates in the same order as passed in
        dateList[ dist2date ]
    }

# try with two input dates
searchDate <- c( '2012-12-14' , '2012-11-18' )

# create a few dates to test against..
someDates <- c( '2012-11-12' ,  '2012-11-17' , '2012-12-15' , '2012-12-13' , '2012-12-15' , '2012-11-17' , '2012-11-20' )

# return the two dates closests to the inputted dates

# the first result gives 12/13, because look.after = FALSE
closestDate( searchDate , someDates )

# the first result gives 12/15, because look.after = TRUE
closestDate( searchDate , someDates , look.after = TRUE )

# reverse the order to prove it still works
someDates <- c( '2012-11-12' , '2012-11-17' , '2012-12-13' , '2012-12-15' , '2012-12-13' , '2012-12-15' , '2012-11-17' )

# the first result gives 12/13, because look.after = FALSE
closestDate( searchDate , someDates )

# the first result gives 12/15, because look.after = TRUE
closestDate( searchDate , someDates , look.after = TRUE )

Upvotes: 2

Related Questions