Jixxi
Jixxi

Reputation: 111

Suggestions on speeding up my 5 loops in R?

I have 5 tables which I would like to find some combinations that fulfills some requirements. I could easily solve the data below by using matrix multiplication to create all possible combinations and afterwards selecting rows that fulfills my needs. The problem is that my original problem consist of 5 tables with 200 rows each. Which requires a couple of hundred gb of ram if generating all possible combinations.

So I tried this instead:

x1 <- seq(1,10,1)
x2 <- seq(5,15,3)
x3 <- seq(2,11,1)
x4 <- seq(1,5,1)
x5 <- seq(1,20,2)

Which should fulfill: x1 < x2 and x1 < x3.

nm <- data.frame(matrix(NA,1,5))

for(a in 1:length(x1)){
for(s in 1:length(x2)){
for(d in 1:length(x3)){
for(f in 1:length(x4)){
for(g in 1:length(x5)){

l1 <- x1[a]
l2 <- x2[s]

if(l1 < l2){

l3 <- x3[d]

if(l1 < l3){

l4 <- x4[f]
l5 <- x5[g]

fy <- c()
fy[1] <- l1
fy[2] <- l2
fy[3] <- l3
fy[4] <- l4
fy[5] <- l5

nm <- rbind(nm, fy)
}}}}}}}

In my original problem I have more if statements, which I hope will increase speed. But I have had it running for about 24hours now, and still not done. The above problem takes me about 10s which makes me think its stuck.

Upvotes: 0

Views: 60

Answers (1)

Roland
Roland

Reputation: 132969

Two problems:

The huge problem is that you grow an object in a loop. This is the slowest operation possible since there is huge OS overhead involved. You need to preallocate the object and only grow it in chunks as necessary.

The medium problem is that you use a data.frame to store the results. Data.frames are useful, but slow. Use a matrix instead.

nm1 <- matrix(nrow = 1e3, ncol = 5) #adjust the chunk size to a reasonable estimate
rx <- 1

for(a in 1:length(x1)){
  for(s in 1:length(x2)){
    for(d in 1:length(x3)){
      for(f in 1:length(x4)){
        for(g in 1:length(x5)){

          l1 <- x1[a]
          l2 <- x2[s]

          if(l1 < l2){

            l3 <- x3[d]

            if(l1 < l3){

              l4 <- x4[f]
              l5 <- x5[g]

              if(rx > nrow(nm1)) nm1 <- rbind(nm1, matrix(nrow = 1e3, ncol = 5))

              nm1[rx, 1] <- l1
              nm1[rx, 2] <- l2
              nm1[rx, 3] <- l3
              nm1[rx, 4] <- l4
              nm1[rx, 5] <- l5

              rx <- rx + 1

            }}}}}}}

nm1 <- nm1[seq_len(rx - 1),]

Timings:

Unit: milliseconds
       expr       min        lq      mean    median        uq       max neval cld
      mod()  589.2437  591.1576  594.4138  593.3678  595.0909  603.2087     5  a 
 original() 4934.4981 4952.4502 4980.6414 4953.3183 4985.7943 5077.1463     5   b

We get a factor 10 performance improvement without actually starting to think about the algorithm. This factor gets bigger if you have more iterations of growing the data.frame. If this is still too slow, you could try to byte-compile the code using the compiler package. It would also be trivial to implement as actual compiled code with Rcpp. However, you should benchmark with increasing number of iterations and extrapolate the timings to your actual problem. You might need to find a better algorithm than brute force or consider if you actually need to do this.

Upvotes: 2

Related Questions