Reputation: 7752
all official tutorials doParallel, doParallel-Vignette, doMC and doMC-Vignette I've found so far cover only how to use parallel-computation in combination with foreach
. Is there a way to speed up "sequential"-code as well?
Imagine it like splitting one file into multiple files and executing each file with a different instance of R
. E.g.
## <run on core1>
data1 <- getData1()
dataResult1 <- doComplexAlgorithm1(data1)
## </run on core1>
## <run on core2>
data2 <- getData2()
dataResult2 <- doComplexAlgorithm2(data2)
## </run on core2>
## <run on core3>
data3 <- getData3()
dataResult3 <- doComplexAntotherAlgorithm3(data3)
## </run on core3>
## <run on core4>
data4 <- getData4()
dataResult4 <- doComplexNotSoComplexAlgorithm4(data4)
## </run on core4>
Thanks in advance!
(R v.3.2.1
, RStudio v.0.99.451
)
Upvotes: 0
Views: 539
Reputation: 2763
So you don't need any memory sharing or communication among each job, or they are independent jobs.
The foreach or lapply paradigm are more designed for splitting a loop or vector process. For totally individual jobs, you need to wrap another layer to make it into a loop.
Wrap each section into a function, put all functions into a list, then call each function in loop.
fun_list <- list(
fun_1 <- function() {
data1 <- getData1()
doComplexAlgorithm1(data1)
},
fun_2 <- function() {
data2 <- getData1()
doComplexAlgorithm2(data2)
},
...
)
Upvotes: 0
Reputation: 57696
In the base (single-process) scenario, you'd use mapply
, passing it a list of your functions:
mapply(function(getData, doAlg) {
dat <- getData()
doAlg(dat)
},
getData=list(getData1, getData2, getData3, getData4),
doAlg=list(algorithm1, algorithm2, algorithm3, algorithm4))
In the parallel processing case, you can use clusterMap
:
library(parallel)
cl <- makeCluster()
clusterMap(cl, function(getData, doAlg) {
dat <- getData()
doAlg(dat)
},
getData=list(getData1, getData2, getData3, getData4),
doAlg=list(algorithm1, algorithm2, algorithm3, algorithm4))
Upvotes: 2
Reputation: 36
It sounds like you want to do what I try to do with images. I've got some images and some computation on them, which by itself takes quite long. The way I do is have a list of files, and:
foreach (i =1:length(fileList)) %dopar% {
- load data
- do something
- write result to disk
}
It's just as you say, each set of data (file), is processed on its own core provided your system has enough memory to hold it all at once.
Upvotes: 1