Reputation: 111
I have 5 tables which I would like to find some combinations that fulfills some requirements. I could easily solve the data below by using matrix multiplication to create all possible combinations and afterwards selecting rows that fulfills my needs. The problem is that my original problem consist of 5 tables with 200 rows each. Which requires a couple of hundred gb of ram if generating all possible combinations.
So I tried this instead:
x1 <- seq(1,10,1)
x2 <- seq(5,15,3)
x3 <- seq(2,11,1)
x4 <- seq(1,5,1)
x5 <- seq(1,20,2)
Which should fulfill: x1 < x2 and x1 < x3.
nm <- data.frame(matrix(NA,1,5))
for(a in 1:length(x1)){
for(s in 1:length(x2)){
for(d in 1:length(x3)){
for(f in 1:length(x4)){
for(g in 1:length(x5)){
l1 <- x1[a]
l2 <- x2[s]
if(l1 < l2){
l3 <- x3[d]
if(l1 < l3){
l4 <- x4[f]
l5 <- x5[g]
fy <- c()
fy[1] <- l1
fy[2] <- l2
fy[3] <- l3
fy[4] <- l4
fy[5] <- l5
nm <- rbind(nm, fy)
}}}}}}}
In my original problem I have more if statements, which I hope will increase speed. But I have had it running for about 24hours now, and still not done. The above problem takes me about 10s which makes me think its stuck.
Upvotes: 0
Views: 60
Reputation: 132969
Two problems:
The huge problem is that you grow an object in a loop. This is the slowest operation possible since there is huge OS overhead involved. You need to preallocate the object and only grow it in chunks as necessary.
The medium problem is that you use a data.frame to store the results. Data.frames are useful, but slow. Use a matrix instead.
nm1 <- matrix(nrow = 1e3, ncol = 5) #adjust the chunk size to a reasonable estimate
rx <- 1
for(a in 1:length(x1)){
for(s in 1:length(x2)){
for(d in 1:length(x3)){
for(f in 1:length(x4)){
for(g in 1:length(x5)){
l1 <- x1[a]
l2 <- x2[s]
if(l1 < l2){
l3 <- x3[d]
if(l1 < l3){
l4 <- x4[f]
l5 <- x5[g]
if(rx > nrow(nm1)) nm1 <- rbind(nm1, matrix(nrow = 1e3, ncol = 5))
nm1[rx, 1] <- l1
nm1[rx, 2] <- l2
nm1[rx, 3] <- l3
nm1[rx, 4] <- l4
nm1[rx, 5] <- l5
rx <- rx + 1
}}}}}}}
nm1 <- nm1[seq_len(rx - 1),]
Timings:
Unit: milliseconds
expr min lq mean median uq max neval cld
mod() 589.2437 591.1576 594.4138 593.3678 595.0909 603.2087 5 a
original() 4934.4981 4952.4502 4980.6414 4953.3183 4985.7943 5077.1463 5 b
We get a factor 10 performance improvement without actually starting to think about the algorithm. This factor gets bigger if you have more iterations of growing the data.frame. If this is still too slow, you could try to byte-compile the code using the compiler package. It would also be trivial to implement as actual compiled code with Rcpp. However, you should benchmark with increasing number of iterations and extrapolate the timings to your actual problem. You might need to find a better algorithm than brute force or consider if you actually need to do this.
Upvotes: 2