Reputation: 286
I have a data set (df1) with a start date and end date for each observation (the actual data has ~50,000 observations).
dateStart <- c("2018-01-23","2017-11-01","2017-11-29")
dateEnd <- c("2018-01-25","2017-11-02","2017-11-30")
obs <- c(1,2,3)
dateStart <- as.Date(as.character(dateStart), format = "%Y-%m-%d")
dateEnd <- as.Date(as.character(dateEnd), format = "%Y-%m-%d")
df1 <- data.frame(obs,dateStart,dateEnd)
df1
obs dateStart dateEnd
1 2018-01-23 2018-01-25
2 2017-11-01 2017-11-02
3 2017-11-29 2017-11-30
The other data set (df2) has recorded values (here the data is >150,000 rows of data):
datetime <- c("2018-01-23 14:30:00", "2018-01-23 15:30:00","2017-11-01 12:10:00","2017-11-01 22:59:00","2017-11-29 00:40:00", "2017-11-29 16:50:00")
value <- c(1.1,1.2,2.1,2.2,3.1,3.2)
date <- as.POSIXct(as.character(datetime), format = "%Y-%m-%d %H:%M:%S")
df2 <- data.frame(datetime,value)
df2
datetime value
2018-01-23 14:30:00 1.1
2018-01-23 15:30:00 1.2
2017-11-01 12:10:00 2.1
2017-11-01 22:59:00 2.2
2017-11-29 00:40:00 3.1
2017-11-29 16:50:00 3.2
How do I insert the peak value from df2 which occurs between the start and end dates in df1? It should look like:
obs dateStart dateEnd value
1 2018-01-23 2018-01-25 1.2
2 2017-11-01 2017-11-02 2.2
3 2017-11-29 2017-11-30 3.2
I have used subsetting in a single data frame, but have no idea how to do it between ranges over multiple rows across two data frames.
Any help much appreciated.
Upvotes: 0
Views: 63
Reputation: 13135
Using sqldf
#First convert dateStart and dateEnd in df1 to POSIXct
dateStart <- as.POSIXct(as.character(dateStart))
dateEnd <- as.POSIXct(as.character(dateEnd))
library(sqldf)
sqldf("SELECT obs, df1.dateStart, df1.dateEnd, df2.date, max(df2.value) As value
FROM df2 left JOIN df1
ON df2.date BETWEEN df1.dateStart AND df1.dateEnd
group by 1") #the 3rd column in SELECT i.e. obs
obs dateStart dateEnd date value
1 NA <NA> <NA> 2017-11-28 21:40:00 3.1 #As 28 is out the interval 29-30
2 1 2018-01-23 2018-01-25 2018-01-23 12:30:00 1.2
3 2 2017-11-01 2017-11-02 2017-11-01 19:59:00 2.2
4 3 2017-11-29 2017-11-30 2017-11-29 13:50:00 3.2
Upvotes: 2
Reputation: 51592
Here is a data.table solution (inspired by this answer),
library(data.table)
setDT(df1)[setDT(df2),
on = .(dateStart <= datetime, dateEnd > datetime),][, .SD[which.max(value)], by = obs][]
which gives,
obs dateStart dateEnd value 1: 1 2018-01-23 2018-01-23 1.2 2: 2 2017-11-01 2017-11-01 2.2 3: 3 2017-11-29 2017-11-29 3.2
Upvotes: 2