Reputation: 899
I have two datasets that I would like to match based on datetime information. Datetime values in dataset 1 represent 10-minute time blocks that end at the given timestamp. I would like to match each row in dataset 2 with the 10-minute time block (and associated data) that it falls within from dataset 1. An example of the data is below:
Dataset 1
datetime count
10/11/2012 16:25 231
10/11/2012 16:35 55
10/11/2012 16:45 0
10/11/2012 16:55 30
10/11/2012 17:05 22
10/11/2012 17:15 431
10/11/2012 17:25 455
10/11/2012 17:35 560
10/11/2012 17:45 7
10/11/2012 17:55 36
10/11/2012 18:05 12
10/11/2012 18:15 144
10/11/2012 18:25 583
Dataset 2
datetime a n dpm
10/11/2012 16:26 2.03 27 3473
10/11/2012 17:24 1.35 28 3636
10/11/2012 18:21 7.63 29 3516
I would like to end up with something that looks like this:
datetime2 a n dpm datetime1 count
10/11/2012 16:26 2.03 27 3473 10/11/2012 16:35 55
10/11/2012 17:24 1.35 28 3636 10/11/2012 17:25 455
10/11/2012 18:21 7.63 29 3516 10/11/2012 18:25 583
I am aware that there are functions such as cbind(), rbind() and merge() that can be used to merge datasets (based on a related question that I asked in the past: How to merge two data frames in r by a common column with mismatched date/time values), but I have been unable to write code that will help in this case. I have tried various combinations of cbind(round(data, “mins”)) and cbind(trun(data, “hours”)), but these functions seem to match multiple time blocks from dataset 1 to each row of dataset 2, rather than just the block that each row falls within. I have tried for hours to find some conditional rule that will solve this problem, but am having no luck. Any help would be greatly appreciated.
Upvotes: 2
Views: 1799
Reputation: 121568
One option is to use prevailing join of data.table
Also known as
and last observation carried forward (locf).
library(data.table)
DT <- data.table(dataset1,key='datetime')
DT1 <- data.table(dataset2,key='datetime')
DT[DT1,roll='nearest']
datetime count a n dpm
1: 2012-11-10 16:26:00 231 2.03 27 3473
2: 2012-11-10 17:24:00 455 1.35 28 3636
3: 2012-11-10 18:21:00 583 7.63 29 3516
EDIT
Here how you can read the data. generally I use zoo
package with read.zoo
or one trick is to add a new column name as above:
dat1 <- read.table(text='date time a n dpm
10/11/2012 16:26 2.03 27 3473
10/11/2012 17:24 1.35 28 3636
10/11/2012 18:21 7.63 29 3516',header=TRUE)
dat1$datetime <- as.POSIXct(paste0(dat1$date,dat1$time),
format='%d/%m/%Y %H:%M')
a n dpm datetime
1 2.03 27 3473 2012-11-10 16:26:00
2 1.35 28 3636 2012-11-10 17:24:00
3 7.63 29 3516 2012-11-10 18:21:00
Upvotes: 2
Reputation: 4414
Something like this ?
findRow <- function(dt, df) { min(which(df$datetime > dt )) }
rows <- sapply(df2$datetime, findRow, df=df1)
res <- cbind(df2, df1[rows,])
datetime a n dpm datetime count
2 10/11/2012 16:26 2.03 27 3473 10/11/2012 16:35 55
7 10/11/2012 17:24 1.35 28 3636 10/11/2012 17:25 455
13 10/11/2012 18:21 7.63 29 3516 10/11/2012 18:25 583
PS1: I think the count of your expected result is wrong on row #1
PS2: It would have been easier if you had provided the datasets in a directly usable form. I had to do:
d1 <-
'datetime count
10/11/2012 16:25 231
...
'
d2 <-
'datetime a n dpm
10/11/2012 16:26 2.03 27 3473
10/11/2012 17:24 1.35 28 3636
10/11/2012 18:21 7.63 29 3516
'
.parse <- function(s) {
cs <- gsub('\\s\\s+', '\t', s)
read.table(text=cs, sep="\t", header=TRUE, stringsAsFactors=FALSE)
}
df1 <- .parse(d1)
df2 <- .parse(d2)
Upvotes: 3