Using dplyr::mutate between two dataframes to create column based on date range

Question

Right now I have two dataframes. One contains over 11 million rows of a start date, end date, and other variables. The second dataframe contains daily values for heating degree days (basically a temperature measure).

set.seed(1)    
library(lubridate)
date.range <- ymd(paste(2008,3,1:31,sep="-"))
daily <- data.frame(date=date.range,value=runif(31,min=0,max=45))
intervals <- data.frame(start=daily$date[1:5],end=daily$date[c(6,9,15,24,31)])

In reality my daily dataframe has every day for 9 years and my intervals dataframe has entries that span over arbitrary dates in this time period. What I wanted to do was to add a column to my intervals dataframe called nhdd that summed over the values in daily corresponding to that time interval (end exclusive).

For example, in this case the first entry of this new column would be

sum(daily$value[1:5])

and the second would be

sum(daily$value[2:8]) and so on.

I tried using the following code

intervals <- mutate(intervals,nhdd=sum(filter(daily,date>=start&date



This is not working and I think it might have something to do with not referencing the columns correctly but I'm not sure where to go.

I'd really like to use dplyr to solve this and not a loop because 11 million rows will take long enough using dplyr.  I tried using more of lubridate but dplyr doesn't seem to support the Period class.

Edit: I'm actually using dates from as.Date now instead of lubridatebut the basic question of how to refer to a different dataframe from within mutate still stands

Using dplyr::mutate between two dataframes to create column based on date range

Answers (1)

Related Questions