Reputation: 6217
I am trying to write R
code which acts as a "moving window", just with memory (state). I have figured out (thanks to this question) how to apply a function to subsequent tuples of elements. For example, if I wish to write a (simple) moving average with a typical period 4, I would do the following:
mapply(myfunc, x[1:(length(x)-4)], x[2:(length(x)-3)], x[3:(length(x)-2)], x[4:(length(x)-1)])
Where myfunc
is a function with 4 arguments, which calculates their mean (I cannot use mean
, as it expects only 1 argument, and I don't know how to make the 4 arguments a single vector).
That's quite cumbersome, though, and if the typical period is 100, say, I am not sure how to do it.
So here's my first question: how do I generalize this?
But here's another issue: suppose I wish the applied function to be able to save state. A simple example would be to keep record of how many values it was applied on so far. Another example is the exponential moving average (EMA), which is not really a window function, but instead a function which works on single values but which keeps state (the last resulted mean).
How can I write a function which when applied to a vector, works on its values one by one, returning a vector of the same length, which is able to retain its last output every time, or save any other "state" during its calculations? In Python, for example, I'd use classes for that, but that's quite difficult in R
.
Important note: I am not interested in auxiliary R
packages like zoo
or TTR
to do the work for me. I am trying to learn R
, and in any case the functions I wish to write, while having similarities with MA
or EMA
, are custom, and do not exist in any of these packages.
Upvotes: 0
Views: 167
Reputation: 52637
For things like this you should consider using a plain for
loop:
x <- runif(10000)
k <- 100
n <- length(x)
res <- numeric(n - k)
library(microbenchmark)
microbenchmark(times=5,
for(i in k:n) res[i - k + 1] <- sum(vec[i:(i + k)]),
{
r <- embed(x, n-k)[1:k, seq(n-k, 1)]
gg <- do.call("mapply", c("sum", split(r, 1:k)))
},
flt <- filter(x, rep(1, k))
)
Produces:
Unit: milliseconds
min lq median uq max neval
for 163.5403 164.4929 165.2543 166.6315 167.0608 5
embed/mapply 1255.2833 1307.3708 1338.2748 1341.5719 1405.1210 5
filter 6.7101 6.7971 6.8073 6.8161 6.8991 5
Now, the results are not identical and I don't pretend to understand exactly what GGrothendieck is doing with embed
, but generally speaking for
loops are just as fast as *pply
functions so long as you initialize your result vectors first. Windowed calculations don't lend themselves well to vectorization, so might as well use a for
loop.
EDIT: as several have pointed out in comments, there appears to be an internally implemented function to do (filter
) this that is quite a bit faster, so that seems to be the best option (though you should confirm it actually does what you want as again, the results are not exactly identical and I am not personally familiar with the function; in it's default configuration it appears to do a rolling weighted sum, or sum if weights are 1, with a centered window).
Upvotes: 1
Reputation: 269491
Regarding your first question,
n <- length(x)
k <- 4
r <- embed(x, n-k)[1:k, seq(n-k, 1)]
do.call("mapply", c("myfunc", split(r, 1:k)))
Regarding the second question, Reduce
can be used to iterate over a vector saving state.
Upvotes: 3