Reputation: 141
I have two data frames, one that includes data by day, and one that includes data by irregular time multi-day intervals. For example:
A data frame precip_range
with precipitation data by irregular time intervals:
start_date<-as.Date(c("2010-11-01", "2010-11-04", "2010-11-10"))
end_date<-as.Date(c("2010-11-03", "2010-11-09", "2010-11-12"))
precipitation<-(c(12, 8, 14))
precip_range<-data.frame(start_date, end_date, precipitation)
And a data frame precip_daily
with daily precipitation data:
day<-as.Date(c("2010-11-01", "2010-11-02", "2010-11-03", "2010-11-04", "2010-11-05",
"2010-11-06", "2010-11-07", "2010-11-08", "2010-11-09", "2010-11-10",
"2010-11-11", "2010-11-12"))
precip<-(c(3, 1, 2, 1, 0.25, 1, 3, 0.33, 0.75, 0.5, 1, 2))
precip_daily<-data.frame(day, precip)
In this example, precip_daily
represents daily precipitation estimated by a model and precip_range
represents measured cumulative precipitation for specific date ranges. I am trying to compare modeled to measured data, which requires synchronizing the time periods.
So, I want to summarize the precip
column in data frame precip_daily
(count of observations and sum of precip
) by the date date ranges between start_date
and end_date
in the data frame precip_range
. Any thoughts on the best way to do this?
Upvotes: 4
Views: 2181
Reputation: 206242
You can use the start_dates from precip_range
as breaks to cut()
to group your daily values. For example
rng <- cut(precip_daily$day,
breaks=c(precip_range$start_date, max(precip_range$end_date)),
include.lowest=T)
Here we cut the values in daily using the start dates in the range data.frame. We're sure to include the lowest value and stop at the largest end value. If we merge that with the daily values we see
cbind(precip_daily, rng)
# day precip rng
# 1 2010-11-01 3.00 2010-11-01
# 2 2010-11-02 1.00 2010-11-01
# 3 2010-11-03 2.00 2010-11-01
# 4 2010-11-04 1.00 2010-11-04
# 5 2010-11-05 0.25 2010-11-04
# 6 2010-11-06 1.00 2010-11-04
# 7 2010-11-07 3.00 2010-11-04
# 8 2010-11-08 0.33 2010-11-04
# 9 2010-11-09 0.75 2010-11-04
# 10 2010-11-10 0.50 2010-11-10
# 11 2010-11-11 1.00 2010-11-10
# 12 2010-11-12 2.00 2010-11-10
which shows that the values have been grouped. Then we can do
aggregate(cbind(count=1, sum=precip_daily$precip)~rng, FUN=sum)
# rng count sum
# 1 2010-11-01 3 6.00
# 2 2010-11-04 6 6.33
# 3 2010-11-10 3 3.50
To get the total for each of those ranges (ranges as labeled with the start date)
Upvotes: 3
Reputation: 92292
Or
library(zoo)
library(data.table)
temp <- merge(precip_daily, precip_range, by.x = "day", by.y = "start_date", all.x = T)
temp$end_date <- na.locf(temp$end_date)
setDT(temp)[, list(Sum = sum(precip), Count = .N), by = end_date]
## end_date Sum Count
## 1: 2010-11-03 6.00 3
## 2: 2010-11-09 6.33 6
## 3: 2010-11-12 3.50 3
Upvotes: 1