Matching data from one dataframe to a time block that it fits within in a second dataframe

Question

I have two datasets that I would like to match based on datetime information. Datetime values in dataset 1 represent 10-minute time blocks that end at the given timestamp. I would like to match each row in dataset 2 with the 10-minute time block (and associated data) that it falls within from dataset 1. An example of the data is below:

Dataset 1
datetime            count
10/11/2012 16:25    231
10/11/2012 16:35    55
10/11/2012 16:45    0
10/11/2012 16:55    30
10/11/2012 17:05    22
10/11/2012 17:15    431
10/11/2012 17:25    455
10/11/2012 17:35    560
10/11/2012 17:45    7
10/11/2012 17:55    36
10/11/2012 18:05    12
10/11/2012 18:15    144
10/11/2012 18:25    583

Dataset 2
datetime            a       n   dpm
10/11/2012 16:26    2.03    27  3473
10/11/2012 17:24    1.35    28  3636
10/11/2012 18:21    7.63    29  3516

I would like to end up with something that looks like this:

datetime2           a       n   dpm     datetime1           count
10/11/2012 16:26    2.03    27  3473    10/11/2012 16:35    55
10/11/2012 17:24    1.35    28  3636    10/11/2012 17:25    455
10/11/2012 18:21    7.63    29  3516    10/11/2012 18:25    583

I am aware that there are functions such as cbind(), rbind() and merge() that can be used to merge datasets (based on a related question that I asked in the past: How to merge two data frames in r by a common column with mismatched date/time values), but I have been unable to write code that will help in this case. I have tried various combinations of cbind(round(data, “mins”)) and cbind(trun(data, “hours”)), but these functions seem to match multiple time blocks from dataset 1 to each row of dataset 2, rather than just the block that each row falls within. I have tried for hours to find some conditional rule that will solve this problem, but am having no luck. Any help would be greatly appreciated.

Karl Forner · Accepted Answer

Something like this ?

findRow <- function(dt, df) { min(which(df$datetime > dt )) }
rows <- sapply(df2$datetime, findRow, df=df1)
res <- cbind(df2, df1[rows,])

           datetime    a  n  dpm         datetime count
2  10/11/2012 16:26 2.03 27 3473 10/11/2012 16:35    55
7  10/11/2012 17:24 1.35 28 3636 10/11/2012 17:25   455
13 10/11/2012 18:21 7.63 29 3516 10/11/2012 18:25   583

PS1: I think the count of your expected result is wrong on row #1

PS2: It would have been easier if you had provided the datasets in a directly usable form. I had to do:

d1 <- 
'datetime            count
10/11/2012 16:25    231
...
'
d2 <- 
'datetime            a       n   dpm
10/11/2012 16:26    2.03    27  3473
10/11/2012 17:24    1.35    28  3636
10/11/2012 18:21    7.63    29  3516
'

.parse <- function(s) {
    cs <- gsub('\s\s+', '	', s)
    read.table(text=cs, sep="	", header=TRUE, stringsAsFactors=FALSE)
}

df1 <- .parse(d1)
df2 <- .parse(d2)

Matching data from one dataframe to a time block that it fits within in a second dataframe

Answers (2)

Related Questions