Reputation: 12087

rollapply() by n-months

I am looking for a way to use rollapply to split a series into sequences by n-months. Suppose you have the following:

z <- zoo(101:465, as.Date(1:365))
as.data.frame(z)

I would like to get a list of vectors (or lists) of index values for each n-months so I can work on the data... Much like the width parameter is implemented in rollapply except the width in this case is variable (depending on the days in a month).

NOTE: would prefer a base-R solution but would be interesting to see other libraries that can be used

Upvotes: 2

Answers (3)

Denis

Reputation: 12087

Ended up rolling my own rollapply() for a zoo object:

HELPER FUNCTION

   get.months.elapsed <- function(start.date, end.date) {
      ed <- as.POSIXlt(end.date)
      sd <- as.POSIXlt(start.date)
      12 * (ed$year - sd$year) + (ed$mon - sd$mon)
    }

1. FORWARD ROLLING WINDOW:

rollapply.list.date.range <- function(data, num.of.months, FUN) 
{
  dates.list <- index(data)
  seq.list <- sapply(dates.list, FUN = function(x) {
     dt <- as.integer(x[1])
     cur.seq.list <- separate.by.months(dt, dates.list, num.of.months)
     names(cur.seq.list) <- dt
     return(cur.seq.list)
  })

  lapply(seq.list, FUN)
}

separate.by.months <- function(dt, dates.list, num.of.months) 
{
  date.seq.indexes <- sapply(dates.list, function(x) { 
    date.diff <- as.integer(x) - dt
    date.normalized <- get.months.elapsed(as.Date(0), as.Date(date.diff))
    floor(date.normalized  / num.of.months)
  })

  seq.list <- split(dates.list, date.seq.indexes)
  seq.list["0"]
}

2. REVERSE ROLLING WINDOW:

rollapply.date.range <- function(data, num.of.months, FUN) 
{
  dates.list <- rev(index(data))
  seq.list <- sapply(dates.list, FUN = function(x) {
     dt <- x[1]
     cur.seq.list <- separate.by.months(dt, dates.list, num.of.months)
     names(cur.seq.list) <- dt
     return(cur.seq.list)
  })

  lapply(seq.list, FUN)
}

separate.by.months <- function(dt, dates.list, num.of.months) 
{
  date.seq.indexes <- sapply(dates.list, function(x) { 
    date.diff <- as.integer(x) - as.integer(dt)
    date.normalized <- ifelse(sign(date.diff) == 1, -9999, -get.months.elapsed(as.Date(0), as.Date(date.diff)))
    floor(date.normalized  / num.of.months)
  })

  seq.list <- split(dates.list, date.seq.indexes)
  seq.list["0"]
}

And then you would call it like:

rollapply.list.date.range(z, 3, mean)

Upvotes: 1

G. Grothendieck

Reputation: 270448

1. aggregate

If what you are looking for is the processing described by the code in your answer to your question then what you are looking for would be best descrbed as an aggregation rather than a rolling application of a function.

To get the mean of each month, each quarter and each n months use aggregate.zoo:

myfun <- mean
aggregate(z, as.yearmon, myfun)
## Jan 1970 Feb 1970 Mar 1970 Apr 1970 May 1970 Jun 1970 Jul 1970 Aug 1970 
##    115.5    144.5    174.0    204.5    235.0    265.5    296.0    327.0 
## Sep 1970 Oct 1970 Nov 1970 Dec 1970 Jan 1971 
##    357.5    388.0    418.5    449.0    465.0 

aggregate(z, as.yearqtr, myfun)
## 1970 Q1 1970 Q2 1970 Q3 1970 Q4 1971 Q1 
##   145.0   235.0   326.5   418.5   465.0 

n <- 3
aggregate(z, as.Date(cut(index(z), paste(n, "months"))), myfun)
## 1970-01-01 1970-04-01 1970-07-01 1970-10-01 1971-01-01 
##      145.0      235.0      326.5      418.5      465.0

or use as.yearmon in place of as.Date. In the above mean can be replaced with an arbitrary function.

2. rollapply

a) If you really did want to roll over n months then create a zoo object ag with one row per month and 31 columns filling the extra columns in short months with NA. Then run rollapplyr with a function that unravels the data for each iteration into one long vector removing the NAs that were added at the end of the short months and feeding it into our arbitrary function.

n <- 3
myfun <- mean

ag <- aggregate(z, as.yearmon, "length<-", value = 31)
rollapplyr(ag, n, function(x) myfun(na.omit(c(t(x)))), fill = NA, by.column = FALSE)
## Jan 1970 Feb 1970 Mar 1970 Apr 1970 May 1970 Jun 1970 Jul 1970 Aug 1970 
##       NA       NA    145.0    175.0    204.5    235.0    265.5    296.5 
## Sep 1970 Oct 1970 Nov 1970 Dec 1970 Jan 1971 
##    326.5    357.5    388.0    418.5    434.5

b) Another possibility is:

s <- split(z, as.yearmon(index(z)))
r <- rollapplyr(seq_along(s), n, function(ix) myfun(unlist(s[ix])), fill = NA)
zoo(r, as.yearmon(names(s), "%b %Y"))   
## Jan 1970 Feb 1970 Mar 1970 Apr 1970 May 1970 Jun 1970 Jul 1970 Aug 1970 
##       NA       NA    145.0    175.0    204.5    235.0    265.5    296.5 
## Sep 1970 Oct 1970 Nov 1970 Dec 1970 Jan 1971 
##    326.5    357.5    388.0    418.5    434.5

3. rollapply with mean

The following work with mean but depending on what your arbitrary function is they may be modifiable to work with it.

a) First, create a 2 column zoo object ag whose rows are the sum and length of each month and then use rollapplyr on that.

n <- 3
ag2 <- aggregate(z, as.yearmon, function(x) c(sum(x), length(x)))
rollapplyr(ag2, 3, function(x) sum(x[, 1]) / sum(x[, 2]), fill = NA, by.column = FALSE)
## Jan 1970 Feb 1970 Mar 1970 Apr 1970 May 1970 Jun 1970 Jul 1970 Aug 1970 
##       NA       NA    145.0    175.0    204.5    235.0    265.5    296.5 
## Sep 1970 Oct 1970 Nov 1970 Dec 1970 Jan 1971 
##    326.5    357.5    388.0    418.5    434.5

b) Or yet another alternative is to create a complex zoo object ag3 whose real and imaginary parts are the sum and number of days in each month and the use rollapplyr on that:

ag3 <- aggregate(z, as.yearmon, function(x) complex(real = sum(x), imag = length(x)))
rollapplyr(ag3, 3, function(x) sum(Re(x)) / sum(Im(x)), fill = NA)
## Jan 1970 Feb 1970 Mar 1970 Apr 1970 May 1970 Jun 1970 Jul 1970 Aug 1970 
##       NA       NA    145.0    175.0    204.5    235.0    265.5    296.5 
## Sep 1970 Oct 1970 Nov 1970 Dec 1970 Jan 1971 
##    326.5    357.5    388.0    418.5    434.5

Upvotes: 3

utubun

Reputation: 4505

Not sure, I got it right. But maybe that could work:

# create data
z <- zoo::zoo(101:465, as.Date(1:365))

# everything you need is cut it by quarter
quarters <- cut(as.Date(index(z)), breaks = 'quarter', labels = F)
# but if you want list of indices, you make them this way
idxs <- split(seq_along(z), quarters)

# to see what you've got
dplyr::glimpse(idxs)
List of 5
 $ 1: int [1:89] 1 2 3 4 5 6 7 8 9 10 ...
 $ 2: int [1:91] 90 91 92 93 94 95 96 97 98 99 ...
 $ 3: int [1:92] 181 182 183 184 185 186 187 188 189 190 ...
 $ 4: int [1:92] 273 274 275 276 277 278 279 280 281 282 ...
 $ 5: int 365