Improve perfomance of loop

Question

I am struggling to improve the performance of the below code, which is running for about 2M entries. First, the condition was inside the loop, and now it is outside, and this brought some improvements, but not enough.

Do you have any other ideas?


if (Floor=="Yes") { 
  for (i in 1:length(X)){
     base_short_term[i] <- pmax(numeric_vector1[i],(1+numeric_vector2[i])^((numeric_vector3[i])/(1+numeric_vector4[i]))
  }
} else {
for (i in 1:length(X)){  
     base_short_term[i] <- pmin(numeric_vector5[i],(1+numeric_vector3[i])^((numeric_vector5[i])/(1+numeric_vector7[i]))
 }
}

linog · Accepted Answer

Loops are bad in R and should be avoided whenever possible. Here this is the case: a vectorized operation would be far more efficient (loops lead to memory overhead) and more readable code:

df <- data.frame(x1 = numeric_vector1,
                x2 = numeric_vector2,
                x3 = numeric_vector3,
                x4 = numeric_vector4,
                x5 = numeric_vector5,
                x7 = numeric_vector7)

if (Floor == "yes"){
   df$base_short_term <- pmax(df$x1, (1+df$x2)^(df$x3/df$x4))
} else{
   df$base_short_term  <- pmin(df$x5, (1+df$x3)^(df$x5/df$x7))
}

If loops cannot be avoided, it's better to use lapply or favor Rcpp

Update

If vectors have different length, you will loose performance because you will need to slice first from 1 to length(X) or use lapply

Slicing vector

df <- data.frame(x1 = numeric_vector1[seq_along(X)],
                x2 = numeric_vector2[seq_along(X)],
                x3 = numeric_vector3[seq_along(X)],
                x4 = numeric_vector4[seq_along(X)],
                x5 = numeric_vector5[seq_along(X)],
                x7 = numeric_vector7[seq_along(X)])

(this solution is possible because even if vectors do not have the same length, you are only using indices up to length(X), for all your vectors)

`lapply`

Really looks like your for loop but more efficient since it avoids creating and dumping object at each iteration

For instance, if Floor is TRUE:

base_short_term <- lapply(seq_along(X), function(i), {
     pmax(numeric_vector1[i],(1+numeric_vector2[i])^((numeric_vector3[i])/(1+numeric_vector4[i]))
  })

Improve perfomance of loop

Answers (1)

Update

Slicing vector

`lapply`

Related Questions