Nick
Nick

Reputation: 286

Return peak value from one data frame that occurs between two dates of another data frame

The Data:

I have a data set (df1) with a start date and end date for each observation (the actual data has ~50,000 observations).

dateStart <- c("2018-01-23","2017-11-01","2017-11-29")
dateEnd   <- c("2018-01-25","2017-11-02","2017-11-30")
obs       <- c(1,2,3)
dateStart <- as.Date(as.character(dateStart), format = "%Y-%m-%d")
dateEnd   <- as.Date(as.character(dateEnd), format = "%Y-%m-%d")
df1       <- data.frame(obs,dateStart,dateEnd)
df1

obs  dateStart    dateEnd
  1 2018-01-23 2018-01-25
  2 2017-11-01 2017-11-02
  3 2017-11-29 2017-11-30

The other data set (df2) has recorded values (here the data is >150,000 rows of data):

datetime <- c("2018-01-23 14:30:00", "2018-01-23 15:30:00","2017-11-01 12:10:00","2017-11-01 22:59:00","2017-11-29 00:40:00", "2017-11-29 16:50:00") 
value <- c(1.1,1.2,2.1,2.2,3.1,3.2)
date <- as.POSIXct(as.character(datetime), format = "%Y-%m-%d %H:%M:%S")
df2 <- data.frame(datetime,value)
df2

           datetime value
2018-01-23 14:30:00   1.1
2018-01-23 15:30:00   1.2
2017-11-01 12:10:00   2.1
2017-11-01 22:59:00   2.2
2017-11-29 00:40:00   3.1
2017-11-29 16:50:00   3.2

The Question:

How do I insert the peak value from df2 which occurs between the start and end dates in df1? It should look like:

obs  dateStart  dateEnd value
1 2018-01-23 2018-01-25 1.2
2 2017-11-01 2017-11-02 2.2
3 2017-11-29 2017-11-30 3.2

I have used subsetting in a single data frame, but have no idea how to do it between ranges over multiple rows across two data frames.

Any help much appreciated.

Upvotes: 0

Views: 63

Answers (2)

A. Suliman
A. Suliman

Reputation: 13135

Using sqldf

#First convert dateStart and dateEnd in df1 to POSIXct
dateStart <- as.POSIXct(as.character(dateStart))
dateEnd <- as.POSIXct(as.character(dateEnd))

library(sqldf)

sqldf("SELECT obs, df1.dateStart, df1.dateEnd, df2.date, max(df2.value) As value
FROM df2 left JOIN df1
ON df2.date BETWEEN df1.dateStart AND df1.dateEnd
group by 1") #the 3rd column in SELECT i.e. obs

   obs  dateStart    dateEnd                date value
 1  NA       <NA>       <NA> 2017-11-28 21:40:00   3.1    #As 28 is out the interval 29-30
 2   1 2018-01-23 2018-01-25 2018-01-23 12:30:00   1.2
 3   2 2017-11-01 2017-11-02 2017-11-01 19:59:00   2.2
 4   3 2017-11-29 2017-11-30 2017-11-29 13:50:00   3.2              

Upvotes: 2

Sotos
Sotos

Reputation: 51592

Here is a solution (inspired by this answer),

library(data.table)

setDT(df1)[setDT(df2), 
          on = .(dateStart <= datetime, dateEnd > datetime),][, .SD[which.max(value)], by = obs][]

which gives,

   obs  dateStart    dateEnd value
1:   1 2018-01-23 2018-01-23   1.2
2:   2 2017-11-01 2017-11-01   2.2
3:   3 2017-11-29 2017-11-29   3.2

Upvotes: 2

Related Questions