Reputation: 1252
I have a dataset with unequally spaced observations and frequently observations occur more than once per day. I'd like to apply a function to windows of my data, but I want the windows to be defined by time rather than by row. For example, I'd like to compute the mean for days 1-5, days 2-6, etc. within my dataset, and days 1-5 may correspond to rows 1-13, days 2-6 corresponds to rows 3-18, etc.
I saw that the rollapply
function accepts zoo
objects, and I assumed it would work as I describe above (i.e. applying the function over windows defined by time rather than windows defined by rows). However, this doesn't seem to be the case:
my.ts = zoo( 1:100, as.Date("201401","%Y%j")+1:100 )
mean1 = rollapply( my.ts, 3, mean, align="right" )
my.ts = zoo( 1:100, as.Date("201401","%Y%j")+1:100/2 )
mean2 = rollapply( my.ts, 3, mean, align="right" )
all( mean1==mean2 )
I'd expect mean2
to be different from mean1
since mean2
has two observations per day instead of one. However, it appears that rollapply
uses rows to define the windows rather than the times from the zoo
object. Is there a work-around for this? Or, possibly some other function I should be using in place of rollapply
?
Upvotes: 3
Views: 629
Reputation: 269644
rollapply
is documented in ?rollapply
so there is no need to guess how it works.
To do what you want fill in the missing days with NAs and then perform the mean. For example, to do a mean for every three days rather than every three observations:
library(zoo)
# test data
tt <- as.Date("2000-01-01") + c(1, 2, 5, 6, 7, 8, 10)
z <- zoo(seq_along(tt), tt)
# fill it out to a daily series, zm, using NAs
g <- zoo(, seq(start(z), end(z), "day")) # zero width zoo series on a grid
zm <- merge(z, g)
rollapply(zm, 3, mean, na.rm = TRUE, fill = NA)
Upvotes: 6