Reputation: 3
I am struggling to improve the performance of the below code, which is running for about 2M entries. First, the condition was inside the loop, and now it is outside, and this brought some improvements, but not enough.
Do you have any other ideas?
if (Floor=="Yes") {
for (i in 1:length(X)){
base_short_term[i] <- pmax(numeric_vector1[i],(1+numeric_vector2[i])^((numeric_vector3[i])/(1+numeric_vector4[i]))
}
} else {
for (i in 1:length(X)){
base_short_term[i] <- pmin(numeric_vector5[i],(1+numeric_vector3[i])^((numeric_vector5[i])/(1+numeric_vector7[i]))
}
}
Upvotes: 0
Views: 45
Reputation: 6226
Loops are bad in R
and should be avoided whenever possible. Here this is the case: a vectorized operation would be far more efficient (loops lead to memory overhead) and more readable code:
df <- data.frame(x1 = numeric_vector1,
x2 = numeric_vector2,
x3 = numeric_vector3,
x4 = numeric_vector4,
x5 = numeric_vector5,
x7 = numeric_vector7)
if (Floor == "yes"){
df$base_short_term <- pmax(df$x1, (1+df$x2)^(df$x3/df$x4))
} else{
df$base_short_term <- pmin(df$x5, (1+df$x3)^(df$x5/df$x7))
}
If loops cannot be avoided, it's better to use lapply
or favor Rcpp
If vectors have different length, you will loose performance because you will need to slice first from 1 to length(X)
or use lapply
df <- data.frame(x1 = numeric_vector1[seq_along(X)],
x2 = numeric_vector2[seq_along(X)],
x3 = numeric_vector3[seq_along(X)],
x4 = numeric_vector4[seq_along(X)],
x5 = numeric_vector5[seq_along(X)],
x7 = numeric_vector7[seq_along(X)])
(this solution is possible because even if vectors do not have the same length, you are only using indices up to length(X)
, for all your vectors)
lapply
Really looks like your for
loop but more efficient since it avoids creating and dumping object at each iteration
For instance, if Floor
is TRUE
:
base_short_term <- lapply(seq_along(X), function(i), {
pmax(numeric_vector1[i],(1+numeric_vector2[i])^((numeric_vector3[i])/(1+numeric_vector4[i]))
})
Upvotes: 1