Zeus
Zeus

Reputation: 1590

parallel nested foreach loops

I'm trying to code a nested parallel foreach loop for a Metropolis-Hastings algorithm, but the matrices aren't combining correctly. Sample code is below, the final matrix, mtx2, should be same dimensions as the original, mtx, but with some rows randomly altered. How should the matrix rows be combined?

I tried the foreach package directly, but same result - mtx2 combines the columns 5 times.

# library(doParallel)
library(foreach)

no_cores <- detectCores() - 2  
cl <- makeCluster(no_cores)  
registerDoParallel(cl)  

mtx <- matrix(data=rnorm(n=1e3*5,mean=0,sd=1),nrow=1e3,ncol=5)
mtx2 <- matrix(data=NA,nrow=1e3,ncol=5)

#basic for loop - slow for large number of rows
for(k in 1:nrow(mtx)){
  for(r in 1:5) {
    if(runif(n=1,min=0,max=1)>0.9){
      mtx2[k,] <- mtx[k,]*10
    }else{
      mtx2[k,] <- mtx[k,]
    }  
  }
}

#series version for de-bugging
mtx2 <-foreach(k=1:nrow(mtx),.combine="rbind") %do% {
  foreach(r=1:5,.combine="c") %do% {
    if(runif(n=1,min=0,max=1)>0.9){
      mtx[k,]*10
    }else{
      mtx[k,]
    }  
  }
}

#parallel version
mtx2 <-foreach(k=1:nrow(mtx),.combine="rbind") %:% {
  foreach(r=1:5,.combine="c") %dopar% {
    if(runif(n=1,min=0,max=1)>0.9){
      mtx[k,]*10
    }else{
      mtx[k,]
    }  
  }
}

mtx2 <- round(mtx2,2)

Upvotes: 1

Views: 240

Answers (1)

Cole
Cole

Reputation: 11255

To expand on comments, you can skip the loop by creating your logical comparison all at once. Here, we create runif(nrow(mtx) * ncol(mtx)) but only take every 5th result to match up the OP inner loop of for (r in 1:5) {...}

The key point is that while the OP question of finding a method of updating a matrix in a nested parallel loop is not possible for this approach, refactoring code can sometimes provide significant performance gains.

nr = 1e4
nc = 5
mtx <- matrix(data=rnorm(n=nr*nc,mean=0,sd=1),nrow=nr,ncol=nc)

set.seed(123L)
lgl = matrix(runif(n = nr * nc), ncol = nc, byrow = TRUE)[, nc] > 0.9
mtx3 = sweep(mtx, 1L, 1 + 9 * lgl, FUN = '*')

all.equal(mtx2, mtx3) ##mtx2 was created with set.seed(123L)

# [1] TRUE

For 1 million rows this is significantly faster:

system.time({
  lgl = matrix(runif(n = nr * nc), ncol = nc, byrow = TRUE)[, nc] > 0.9
  mtx3 = sweep(mtx, 1L, 1 + 9 * lgl, FUN = '*')
})

##    user  system elapsed 
##    0.27    0.00    0.27 

system.time({
  for(k in 1:nrow(mtx)){
    for(r in 1:5) {
      if(runif(n=1,min=0,max=1)>0.9){
        mtx2[k,] <- mtx[k,]*10
      }else{
        mtx2[k,] <- mtx[k,]
      }  
    }
  }
})

##    user  system elapsed 
##   14.09    0.03   14.12 

Upvotes: 1

Related Questions