Boern
Boern

Reputation: 7752

Use of doParallel / doMC not only with foreach package

all official tutorials doParallel, doParallel-Vignette, doMC and doMC-Vignette I've found so far cover only how to use parallel-computation in combination with foreach. Is there a way to speed up "sequential"-code as well?

Imagine it like splitting one file into multiple files and executing each file with a different instance of R. E.g.

## <run on core1>
data1 <- getData1()
dataResult1 <- doComplexAlgorithm1(data1)
## </run on core1>

## <run on core2>
data2 <- getData2()
dataResult2 <- doComplexAlgorithm2(data2)
## </run on core2>

## <run on core3>
data3 <- getData3()
dataResult3 <- doComplexAntotherAlgorithm3(data3)
## </run on core3>

## <run on core4>
data4 <- getData4()
dataResult4 <- doComplexNotSoComplexAlgorithm4(data4)
## </run on core4>

Thanks in advance!

(R v.3.2.1, RStudio v.0.99.451)

Upvotes: 0

Views: 539

Answers (3)

dracodoc
dracodoc

Reputation: 2763

So you don't need any memory sharing or communication among each job, or they are independent jobs.

The foreach or lapply paradigm are more designed for splitting a loop or vector process. For totally individual jobs, you need to wrap another layer to make it into a loop.

Wrap each section into a function, put all functions into a list, then call each function in loop.

fun_list <- list(
  fun_1 <- function() {
    data1 <- getData1()
    doComplexAlgorithm1(data1)
},
    fun_2 <- function() {
    data2 <- getData1()
    doComplexAlgorithm2(data2)
},
...
)

Upvotes: 0

Hong Ooi
Hong Ooi

Reputation: 57696

In the base (single-process) scenario, you'd use mapply, passing it a list of your functions:

mapply(function(getData, doAlg) {
    dat <- getData()
    doAlg(dat)
},
getData=list(getData1, getData2, getData3, getData4),
doAlg=list(algorithm1, algorithm2, algorithm3, algorithm4))

In the parallel processing case, you can use clusterMap:

library(parallel)
cl <- makeCluster()
clusterMap(cl, function(getData, doAlg) {
    dat <- getData()
    doAlg(dat)
},
getData=list(getData1, getData2, getData3, getData4),
doAlg=list(algorithm1, algorithm2, algorithm3, algorithm4))

Upvotes: 2

NNN
NNN

Reputation: 36

It sounds like you want to do what I try to do with images. I've got some images and some computation on them, which by itself takes quite long. The way I do is have a list of files, and:

foreach (i =1:length(fileList)) %dopar% { 
    - load data
    - do something
    - write result to disk
} 

It's just as you say, each set of data (file), is processed on its own core provided your system has enough memory to hold it all at once.

Upvotes: 1

Related Questions