user3791234
user3791234

Reputation: 141

Summarize R data frame based on a date range in a second data frame

I have two data frames, one that includes data by day, and one that includes data by irregular time multi-day intervals. For example:

A data frame precip_range with precipitation data by irregular time intervals:

start_date<-as.Date(c("2010-11-01", "2010-11-04", "2010-11-10"))
end_date<-as.Date(c("2010-11-03", "2010-11-09", "2010-11-12"))
precipitation<-(c(12, 8, 14))
precip_range<-data.frame(start_date, end_date, precipitation)

And a data frame precip_daily with daily precipitation data:

day<-as.Date(c("2010-11-01", "2010-11-02", "2010-11-03", "2010-11-04", "2010-11-05",
                  "2010-11-06", "2010-11-07", "2010-11-08", "2010-11-09", "2010-11-10",
                  "2010-11-11", "2010-11-12"))
precip<-(c(3, 1, 2, 1, 0.25, 1, 3, 0.33, 0.75, 0.5, 1, 2))
precip_daily<-data.frame(day, precip)

In this example, precip_daily represents daily precipitation estimated by a model and precip_range represents measured cumulative precipitation for specific date ranges. I am trying to compare modeled to measured data, which requires synchronizing the time periods.

So, I want to summarize the precip column in data frame precip_daily (count of observations and sum of precip) by the date date ranges between start_date and end_date in the data frame precip_range. Any thoughts on the best way to do this?

Upvotes: 4

Views: 2181

Answers (2)

MrFlick
MrFlick

Reputation: 206242

You can use the start_dates from precip_range as breaks to cut() to group your daily values. For example

rng <- cut(precip_daily$day, 
    breaks=c(precip_range$start_date, max(precip_range$end_date)), 
    include.lowest=T)

Here we cut the values in daily using the start dates in the range data.frame. We're sure to include the lowest value and stop at the largest end value. If we merge that with the daily values we see

cbind(precip_daily, rng)

#           day precip        rng
# 1  2010-11-01   3.00 2010-11-01
# 2  2010-11-02   1.00 2010-11-01
# 3  2010-11-03   2.00 2010-11-01
# 4  2010-11-04   1.00 2010-11-04
# 5  2010-11-05   0.25 2010-11-04
# 6  2010-11-06   1.00 2010-11-04
# 7  2010-11-07   3.00 2010-11-04
# 8  2010-11-08   0.33 2010-11-04
# 9  2010-11-09   0.75 2010-11-04
# 10 2010-11-10   0.50 2010-11-10
# 11 2010-11-11   1.00 2010-11-10
# 12 2010-11-12   2.00 2010-11-10

which shows that the values have been grouped. Then we can do

aggregate(cbind(count=1, sum=precip_daily$precip)~rng, FUN=sum)

#          rng count  sum
# 1 2010-11-01     3 6.00
# 2 2010-11-04     6 6.33
# 3 2010-11-10     3 3.50

To get the total for each of those ranges (ranges as labeled with the start date)

Upvotes: 3

David Arenburg
David Arenburg

Reputation: 92292

Or

library(zoo)
library(data.table)
temp <- merge(precip_daily, precip_range, by.x = "day", by.y = "start_date", all.x = T)
temp$end_date <- na.locf(temp$end_date)
setDT(temp)[, list(Sum = sum(precip), Count = .N), by = end_date]

##     end_date  Sum Count
## 1: 2010-11-03 6.00     3
## 2: 2010-11-09 6.33     6
## 3: 2010-11-12 3.50     3

Upvotes: 1

Related Questions