Andrew Dempsey
Andrew Dempsey

Reputation: 190

R - How to parallelize a for loop that does monte carlo simluations?

I am trying to work out how to parallelize some code from "data mining with R - learning with case studies" in order to have it run quicker on my macbook pro. The particular code in question is below. The code basically uses the same data (DSs) and applies six different learners (e.g. svm, nnet for regression and classification etc) with a small number of variants.

The full code is HERE (near the bottom, in the "model evaluation and selection" section).

for(td in TODO) {
  assign(td,
     experimentalComparison(
       DSs,         
       c(
         do.call('variants',
                 c(list('singleModel',learner=td),VARS[[td]],
                   varsRootName=paste('single',td,sep='.'))),
         do.call('variants',
                 c(list('slide',learner=td,
                        relearn.step=c(60,120)),
                   VARS[[td]],
                   varsRootName=paste('slide',td,sep='.'))),
         do.call('variants',
                 c(list('grow',learner=td,
                        relearn.step=c(60,120)),
                   VARS[[td]],
                   varsRootName=paste('grow',td,sep='.')))
         ),
        MCsetts)
     )
  # save the results
  save(list=td,file=paste(td,'Rdata',sep='.'))
}

Most of the parallelization information I find, seems to be more applicable to things like 'apply', where the same function is applied to different subsets of the data. What this code does, is the opposite - different functions applied the same data.

Would it be better to parallel the outer FOR loop, so that the code within is run for multiple learners at a time, as opposed to parallel the code within the loop so that the different windowing approaches are paralleled for a single learner.

Execution for a single iteration is just over 2 hours on my macbook, where only 2 cores appear to be doing anything (the other two just sit idle). The actual code from the link is set to 20 iterations... It would be great to use my idle cores to reduce this

Upvotes: 1

Views: 1854

Answers (1)

Richie Cotton
Richie Cotton

Reputation: 121057

In the non-parallel case, passing functions into an lapply loop is straightforward.

lapply(c(mean, sum), function(f) f(1:5))

The are a few different systems for parallel programming with R. This next example uses snow.

library(snow)
cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
clusterApply(cl, c(mean, sum), function(f) f(1:5))
stopCluster(cl)

You should get the same answer in each case!

Upvotes: 2

Related Questions