Reputation: 935
I’ve successfully parallelised a function – let’s call it AddOne
- via the doParallel
package, foreach
and %dopar%
and I’m familiar with the .packages
and .export
arguments to foreach
.
My problem is that I would like AddOne
, instead of being a “stand-alone” function, to be an element of a list and in this case, I can’t get things working. Specifically, if AddOne
calls a subroutine AddOneSubroutine
then AddOneSubroutine
does not get found in the “worker” environments even though it is “exported”.
I’m using Windows 10 and R.version
yields:
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.1
year 2017
month 06
day 30
svn rev 72865
language R
version.string R version 3.4.1 (2017-06-30)
nickname Single Candle
The doParallel version I have is 1.0.10. Here’s some code that demonstrates the problem as succinctly as I could.
library(doParallel)
if(!exists("Registered")){
registerDoParallel(cores = detectCores(logical = TRUE))
Registered = TRUE
}
AddOne<-function(x){AddOneSubroutine(x)}
AddOneSubroutine <-function(x){x+1}
MyList<-list()
MyList$f<-AddOne
# Not using parallel environments, works correctly when calling AddOne 3 times
Result1 = foreach(i = 1:3) %do% AddOne(i)
Result1
# Not using parallel environments, works correctly when calling MyList$f 3 times
Result2 = foreach(i = 1:3) %do% MyList$f(i)
Result2
# Using parallel environments, works correctly when calling AddOne 3 times,
# despite not explicitly using the .export argument to export AddOneSubroutine
Result3 = foreach(i = 1:3) %dopar% AddOne(i)
Result3
# Using parallel environments, fails when calling MyList$f with error
# "could not find function "AddOneSubroutine"", even though that function is "exported"
Result4 = foreach(i = 1:3,.export = "AddOneSubroutine") %dopar% MyList$f(i)
Result4
What am I failing to understand?
Upvotes: 0
Views: 2191
Reputation: 6815
For full reproducibility everywhere, let us make sure we use workers in background R sessions:
library("doParallel")
cl <- parallel::makeCluster(detectCores(logical = TRUE))
registerDoParallel(cl)
Now, I haven't dug into the doParallel backend code in much detail, so I'm not 100% sure what causes this problem. But we know that AddOneSubroutine
is indeed exported, which you can see if you use foreach(..., .verbose = TRUE)
, or simply do:
AddOneSubroutine <- function(x) { x + 1 }
y <- foreach(i = 1L, .export = "AddOneSubroutine") %dopar% {
get("AddOneSubroutine")
}
str(y)
## List of 1
## $ :function (x)
## ..- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 20 1 40 20 40 1 1
## .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x2e475a0>
However, when calling the function MyList$f()
it is not found, which can be confirmed by using:
AddOne <- function(x) exists("AddOneSubroutine")
MyList <- list()
MyList$f <- AddOne
y <- foreach(i = 1L, .export = "AddOneSubroutine") %dopar% {
MyList$f(i)
}
str(y)
## List of 1
## $ : logi FALSE
So, why is AddOneSubroutine
not in the frames searched from within MyList$f
? This could be because doParallel does not get the environment for MyList$f
correct. A workaround that seems to work, is the following hack:
AddOne <- function(x) { AddOneSubroutine(x) }
y <- foreach(i = 1L) %dopar% {
environment(MyList$f) <- environment(AddOneSubroutine)
MyList$f(i)
}
str(y)
## List of 1
## $ : num 2
Unfortunately, it's not very neat nor very convenient.
As an alternative, the doFuture backend (I'm the author) seems to work a slightly better:
library("doFuture")
registerDoFuture()
plan(multisession)
AddOneSubroutine <- function(x) { x + 1 }
AddOne <- function(x) { AddOneSubroutine(x) }
MyList <- list()
MyList$f <- AddOne
y <- foreach(i = 1L) %dopar% {
AddOneSubroutine ## dummy guiding auto-export
MyList$f(i)
}
str(y)
## List of 1
## $ : num 2
PS. You're particular use case interested me because ideally AddOneSubroutine
should have been exported automatically when using doFuture but it didn't. I've found a fix for this in the underlying globals package (I'm the author) but I need to think more about it before publishing it.
My details:
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] doFuture_0.5.1 iterators_1.0.8 foreach_1.4.3 future_1.6.1
loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1 listenv_0.6.0 codetools_0.2-15
[5] digest_0.6.12 globals_0.10.2
Upvotes: 3