stefanodv
stefanodv

Reputation: 533

Optimization R loop when each iteration is dependent on the results of previous iterations

I need to optimize an R script. In particular, I need to speed up or delete some of the script's inundation cycles. I have defined many cycles of the type:

DT <- data.frame("x"=c(1:20),
                 "y"=c(20:1))
DT$vect[1] <- DT$y[1]
for (i in 2:20) {
  DT$vect[i] <- DT$vect[i-1] * DT$x[i] - DT$x[i-1] * (1 + DT$y[i]) 
}

Since to calculate the value at position i one needs to know that at position i-1. I can not think of a better solution.

Does anyone know a smarter one?

Upvotes: 4

Views: 1050

Answers (2)

MrFlick
MrFlick

Reputation: 206187

It might not be that much prettier, but you can use dplyr and purrr to do a reduce type function.

DT %>% 
  select(x,y) %>% 
  mutate(prevx=lag(x, default=-1)) %>% 
  transpose() %>% 
  accumulate(function(prev, xx) {
    prev * xx$x - xx$prevx*(1+xx$y)
  }, .init=-1/DT$x[1]) %>% 
  tail(-1)
#  [1] 2.000000e+01 2.000000e+01 2.200000e+01 3.400000e+01 1.020000e+02
#  [6] 5.320000e+02 3.634000e+03 2.897400e+04 2.606620e+05 2.606512e+06
# [11] 2.867152e+07 3.440582e+08 4.472756e+09 6.261858e+10 9.392787e+11
# [16] 1.502846e+13 2.554838e+14 4.598709e+15 8.737547e+16 1.747509e+18

We use the lag() function to get both x[i] and x[i-1] on the same row. We use transpose to get a list of named values that we can iterate over. Then accumulate() allows use to keep plugging the output of a function back into itself as input and keeps track of the values along the way. Here we plug in the formula provided and use a special initial value that satisfied the initial conditions you gave of having the first value be equal to the first y value. Finally we trim off the dummy first value.

Upvotes: 3

Ralf Stubner
Ralf Stubner

Reputation: 26823

@MrFlick's solution is very nice, but if you are more comfortable with the for loop and don't mind mixing in another language, you can give Rcpp a try. This type of loops are a prime example where C++ is more efficient:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector forLoop(DataFrame dt) {
  int N = dt.nrow();
  NumericVector x = dt["x"];
  NumericVector y = dt["y"];
  NumericVector vec(N, y(0));
  for (int i = 1; i < N; ++i) {
    vec(i) = vec(i-1) * x(i) - x(i-1) * (1 + y(i));
  } 
  return vec;
}

/*** R

N <- 20000
DT <- data.frame("x"=c(1:N),
                 "y"=c(N:1))
DT$vect[1] <- DT$y[1]
system.time({
  for (i in 2:N) {
    DT$vect[i] <- DT$vect[i-1] * DT$x[i] - DT$x[i-1] * (1 + DT$y[i]) 
  }
})
DT2 <- data.frame("x"=c(1:N),
                 "y"=c(N:1))
vect <- vector("numeric", length = N)
vect[1] <- DT2$y[1]
system.time({
  for (i in 2:N) {
    vect[i] <- vect[i-1] * DT2$x[i] - DT2$x[i-1] * (1 + DT2$y[i]) 
  }
  DT2$vect <- vect
})

all.equal(DT, DT2)

DT3 <- data.frame("x"=c(1:N),
                 "y"=c(N:1))
system.time({
  vect <- forLoop(DT3)
  DT3$vect <- vect
})
all.equal(DT, DT3)
*/

The original loop takes 1.5 s on my machine, while the C++ solution DT3 is "instantaneous". Between the two there is a minor optimization you can do in R: Do not write to a data.frame inside a loop. You are better of writing to a vector and adding that in the end. Here the output of profvis for DT and DT2:

profiler

Still much slower than C++, though.

Upvotes: 2

Related Questions