Find a value that is closest in time to another

Question

I have a data frame that is composed of events linked together by a code. Each of these events has a count, a date and a time. I would like for a given code, to find the count that is closest to a given date and time. For example with this data frame:

x.df <- structure(list(id = 1:20, code = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), count = c(2L, 
3L, 5L, 7L, 8L, 1L, 2L, 7L, 9L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 
4L, 8L, 8L), date = structure(c(1L, 1L, 2L, 2L, 3L, 4L, 4L, 4L, 
5L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L), .Label = c("2019-01-01", 
"2019-01-02", "2019-01-03", "2019-02-11", "2019-02-12", "2019-04-22", 
"2019-04-23", "2019-04-24", "2019-04-25"), class = "factor"), 
    time = structure(c(11L, 12L, 10L, 13L, 14L, 1L, 2L, 5L, 7L, 
    17L, 19L, 2L, 3L, 9L, 18L, 4L, 6L, 8L, 15L, 16L), .Label = c("01:01:01", 
    "02:01:02", "02:11:02", "03:01:03", "07:01:07", "09:01:04", 
    "09:01:09", "10:01:04", "12:01:02", "12:10:01", "12:12:12", 
    "12:34:23", "13:15:30", "14:19:23", "18:01:08", "19:01:08", 
    "22:02:12", "23:01:03", "23:02:12"), class = "factor")), class = "data.frame", row.names = c(NA, 
-20L))

I would like a function

findcount(code,date,time)

so

findcount(1,"2019-01-02","12:00:00") = 5
findcount(2,"2019-02-02","14:10:23") = 1
findcount(3,"2019-04-29","16:10:00") = 8

I have tried to subset the data; sort the data and then calculate some time differences but it does not work. Also, there may be a more efficient way of doing such an operation than I am thinking along. Thanks.

Joseph Crispell · Accepted Answer

I've written a function that works for your examples. Firstly, I created a column in your dataframe that combines the dates and times:

# Create a column that combines the date and time into a single date object
x.df$DateAndTime <- as.POSIXlt(paste(x.df$date, x.df$time))

Then using the following function:

findcount <- function(code, date, time, x.df){

   # Subset the dataframe to include only dates for the current code
   subset <- x.df[x.df$code == code, ]

   # Create a date and time object for the input date and time
   currentDateAndTime <- as.POSIXlt(paste(date, time))

   # Calculate the absolute difference between every date and the current date
   differences <- abs(as.numeric(subset$DateAndTime - currentDateAndTime))

   return(subset$count[which.min(differences)])
}

I can quickly identify the counts, for a given code, that correspond the closest date and time:

findcount(1,"2019-01-02","12:00:00", x.df) = 5
findcount(2,"2019-02-02","14:10:23", x.df) = 1
findcount(3,"2019-04-29","16:10:00", x.df) = 8

Note the format for combining dates and times into a single object is quite specific (see this description), but luckily you were using a format that would work without modification.

Find a value that is closest in time to another

Answers (2)

Related Questions