Michael Clinton
Michael Clinton

Reputation: 655

Vectorizing diff function on lags in R

I would like to make a function that does the following:

c <- rnorm(100)
n <- 10
sum.diff<- integer(n)

for (k in 1:n) {
   sum.diff[k] <- sum(diff(c, lag=k))
}

through vectorization rather than looping. Specifically, I want to send in one vector (here 'c'), and a vector of lag values (here '1:n'), and get out the sum of the k-th differences in the k-th entry of the output vector (here 'sum.lags').

For example, c <- 1:100 with n <- 10 should yield:

99 196 291 ... 900

which corresponds to:

sum(diff(1:100,lag=1)) sum(diff(1:100,lag=2)) sum(diff(1:100,lag=3)) ... sum(diff(1:100,lag=10)) Any ideas?

Upvotes: 0

Views: 278

Answers (3)

David Arenburg
David Arenburg

Reputation: 92282

A data.table implemitation (should be slightly faster than your code on big data sets)

a <- 1:100
b <- 1:10
library(data.table)
DT <- data.table(b)[, Res := sum(diff(a, b)), by = b]
DT

# b Res
# 1:  1  99
# 2:  2 196
# 3:  3 291
# 4:  4 384
# 5:  5 475
# 6:  6 564
# 7:  7 651
# 8:  8 736
# 9:  9 819
# 10: 10 900

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269441

Try the following:

 sum.diff <- function(c, n) sapply(n, function(k) sum(diff(c, lag = k)))

Now run test:

 sum.diff(1:100, 1:10)
 ## [1]  99 196 291 384 475 564 651 736 819 900

Upvotes: 1

alexis_laz
alexis_laz

Reputation: 13122

Since it was mentioned in the comments about performance and C/C++, here is a way using .Call that seems valid:

library(inline)

ff = cfunction(sig = c(R_x = "numeric", R_lag = "integer"), body = '
   SEXP x, lag, ans;
   PROTECT(x = coerceVector(R_x, REALSXP));
   PROTECT(lag = coerceVector(R_lag, INTSXP));
   PROTECT(ans = allocVector(REALSXP, LENGTH(lag)));

   double *px = REAL(x), *pans = REAL(ans);
   memset(pans, 0, sizeof(double)*LENGTH(ans));
   R_len_t *plag = INTEGER(lag);

   for(int l = 0; l < LENGTH(lag); l++) 
       for(int i = 0; i < LENGTH(x) - plag[l]; i++) 
           pans[l] += px[i + plag[l]] - px[i];

   UNPROTECT(3);

   return(ans);
')

ff(1:100, 1:10)
#[1]  99 196 291 384 475 564 651 736 819 900

And some benchmarkings:

OPff = function(c, n) {
   sum.diff <- integer(n)
   for (k in 1:n) sum.diff[k] <- sum(diff(c, lag = k))
   sum.diff
}

ff2 = function(c, n) unlist(lapply(1:n, function(i) sum(diff(c, lag = i))))

xx = runif(1e4)
l = 1e3

identical(OPff(xx, l), ff(xx, 1:l))
#[1] TRUE
identical(OPff(xx, l), ff2(xx, l))
#[1] TRUE
library(microbenchmark)
microbenchmark(OPff(xx, l), ff(xx, 1:l), ff2(xx, l), times = 10)
#Unit: milliseconds
#        expr       min        lq    median        uq       max neval
# OPff(xx, l) 387.49171 390.43269 407.25796 427.09764 485.97181    10
# ff(xx, 1:l)  37.73505  38.27028  39.10201  41.33271  46.84648    10
#  ff2(xx, l) 384.35098 389.70397 401.51451 423.38513 436.85008    10

Upvotes: 2

Related Questions