Divi
Divi

Reputation: 1634

Improving run time for R with nested for loops

My reproducible R example:

f = runif(1500,10,50)
p = matrix(0, nrow=1250, ncol=250)
count = rep(0, 1250)
for(i in 1:1250) {
    ref=f[i]
    for(j in 1:250) {
        p[i,j] = f[i + j - 1] / ref-1
        if(p[i,j] == "NaN") {
           count[i] = count[i]
           }
        else if(p[i,j] > (0.026)) {
                count[i] = (count[i] + 1) 
                ref = f[i + j - 1] 
                } 
        } 
    }

To be more precise, I have a set of 600 f-series and this code runs 200 times for each f-series. Currently I am doing the iterations in loops and most of the operations are element-wise. My random variables are f, the condition if(p[i,j] > (0.026)), and the number 0.026 in itself.

One can drastically reduce the run-time by vectorizing my code and using functions, specifically the apply family, but I am rusty with apply and looking for some advice to proceed in the right direction.

Upvotes: 1

Views: 1075

Answers (3)

Divi
Divi

Reputation: 1634

@ajmartin, your logic was better and reduced the number of iterations I was attempting. Here is the improved version of your code in R:

f1 <- function() {
  n <- 1500
  d <- 250
  f = runif(n,1,5)
  count = rep(0, n-d)
  for(i in 1:(n-d)) {
    tem <- f[i:(i+d-1)] / f[i] - 1
    ind = which(tem>0.026)[1]
    while(length(which(tem>0.026))){
      count[i] = count[i] + 1
      tem[ind:d] = f[ind:d] / tem[ind] - 1
      ind = ind - 1 + (which(tem[ind:d] > 0.026)[1])
    }
  }
  list(f, count)
}

system.time(f1())[3]
# elapsed 
#    0.09 

Implementing this in Rcpp will further reduce system-time but I can't install Rtools as my current machine does not have admin rights. Meanwhile this helps.

Upvotes: 0

ajmartin
ajmartin

Reputation: 2409

Here is an implementation using while, although it is taking much longer than nested for loops which is a bit counter intuitive.

f1 <- function() {
    n <- 1500
    d <- 250
    f = runif(n,1,5)
    f = embed(f, d)
    f = f[-(n-d+1),]
    count = rep(0, n-d)
    for(i in 1:(n-d)) {
        tem <- f[i,]/f[i,1] - 1
        ti <- which(t[-d] > 0.026)[1]
        while(ti < d & !is.na(ti)) {
            ti.plus = ti+1
            tem[ti.plus:d] = f[i, ti.plus:d] / tem[ti]
            count[i] = count[i] + 1
            ti <- ti + which(tem[ti.plus:d-1] > 0.026)[1]
        }
        f[i] = tem
    }
    list(f, count)
}

system.time(f1())

#elapsed 
#6.365

Upvotes: 1

Khashaa
Khashaa

Reputation: 7373

It is quite easy to put for loops in Rcpp. I just copy-pasted your code to Rcpp and haven't checked the validity. In case of discrepancy, let me know. fCpp returns the list of p and c.

cppFunction('List fCpp(NumericVector f) {
    const int n=1250; 
            const int k=250;
            NumericMatrix p(n, k);
            NumericVector c(n);

            for(int i = 0; i < n; i++) {
            double ref=f[i];
            for(int j = 0; j < k; j++) {
            p(i,j) = f[i+j+1]/ref-1;
            if(p(i,j) == NAN){
            c[i]=c[i];
            }
            else if(p(i,j) > 0.026){
            c[i] = c[i]+1; 
            ref = f[i+j+1]; 
            } 
            }
            }
            return List::create(p, c);
            }')

Benchmark

set.seed(1)
f = runif(1500,10,50)

f1 <- function(f){
    p = matrix(0, nrow=1250, ncol=250)
    count = rep(0, 1250)
    for(i in 1:1250) {
        ref=f[i]
        for(j in 1:250) {
            p[i,j] = f[i + j - 1] / ref-1
            if(p[i,j] == "NaN") {
                count[i] = count[i]
            }
            else if(p[i,j] > (0.026)) {
                count[i] = (count[i] + 1) 
                ref = f[i + j - 1] 
            } 
        } 
    }
    list(p, count)
}


microbenchmark::microbenchmark(fCpp(f), f1(f), times=10L, unit="relative")
Unit: relative
    expr      min       lq     mean   median       uq      max neval
 fCpp(f)   1.0000   1.0000   1.0000   1.0000   1.0000   1.0000    10
   f1(f) 785.8484 753.7044 734.4243 764.5883 718.0868 644.9022    10

Values returned by fCpp(f) and f1(f) are essentially identical, apart from column 1 of p matrix returned by f1 is filled with 0s.

system.time(a <- f1(f))[3]
#elapsed 
#    2.8 
system.time(a1 <- fCpp(f))[3]
#elapsed 
#      0 
all.equal( a[[1]], a1[[1]])
#[1] "Mean relative difference: 0.7019406"
all.equal( a[[2]], a1[[2]])
#[1] TRUE

Upvotes: 4

Related Questions