random_forest_fanatic
random_forest_fanatic

Reputation: 1252

rollapply with zoo and sub-daily data

I have a dataset with unequally spaced observations and frequently observations occur more than once per day. I'd like to apply a function to windows of my data, but I want the windows to be defined by time rather than by row. For example, I'd like to compute the mean for days 1-5, days 2-6, etc. within my dataset, and days 1-5 may correspond to rows 1-13, days 2-6 corresponds to rows 3-18, etc.

I saw that the rollapply function accepts zoo objects, and I assumed it would work as I describe above (i.e. applying the function over windows defined by time rather than windows defined by rows). However, this doesn't seem to be the case:

my.ts = zoo( 1:100, as.Date("201401","%Y%j")+1:100 )
mean1 = rollapply( my.ts, 3, mean, align="right" )
my.ts = zoo( 1:100, as.Date("201401","%Y%j")+1:100/2 )
mean2 = rollapply( my.ts, 3, mean, align="right" )
all( mean1==mean2 )

I'd expect mean2 to be different from mean1 since mean2 has two observations per day instead of one. However, it appears that rollapply uses rows to define the windows rather than the times from the zoo object. Is there a work-around for this? Or, possibly some other function I should be using in place of rollapply?

Upvotes: 3

Views: 629

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269644

rollapply is documented in ?rollapply so there is no need to guess how it works.

To do what you want fill in the missing days with NAs and then perform the mean. For example, to do a mean for every three days rather than every three observations:

library(zoo)

# test data
tt <- as.Date("2000-01-01") + c(1, 2, 5, 6, 7, 8, 10)
z <- zoo(seq_along(tt), tt)

# fill it out to a daily series, zm, using NAs
g <- zoo(, seq(start(z), end(z), "day")) # zero width zoo series on a grid
zm <- merge(z, g)

rollapply(zm, 3, mean, na.rm = TRUE, fill = NA)

Upvotes: 6

Related Questions