Reputation: 300
I'm trying to understand what is happening behind the Rcpp::sourceCpp()
call on a parallelized environment. Recently, this was partially addressed in the question: Using Rcpp function in parLapply on Windows.
Within this post, Dirk said,
"You need to run the sourceCpp() call in each spawned process, or else get them your code."
This was in response to questioner's use of distributing the Rcpp function to the worker processes. The questioner was sending the Rcpp function via:
clusterExport(cl = cl, varlist = "payoff")
I'm confused as to why this doesn't work. My thoughts are that this was what the objective of the clusterExport()
is for.
Upvotes: 3
Views: 1943
Reputation: 20746
The issue here is that the compiled code is not "exportable" to the spawned processes without being embedded in a package due to how binaries are linked into R's processes.
Traditionally, the clusterExport()
statement allows for R specific code to be distributed to workers.
By using clusterExport()
on an Rcpp function, you are only receiving the R declaration and not the underlying shared library. That is to say, the R CMD SHLIB
given in Attributes.R is not shared with / exported to the workers. As a result, when a call is then made to an Rcpp
function on the worker, R cannot find the correct shared library.
Take the previous question's function:
Rcpp::cppFunction("NumericVector payoff( double strike, NumericVector data) {
return pmax(data - strike, 0);
}")
Note: I'm using cppFunction()
instead of sourceCpp()
but the results are equivalent since cppFunction()
calls sourceCpp()
to create the function.
Typing the function name:
payoff
Yields the R declaration with a shared library pointer.
function (strike, data)
.Primitive(".Call")(<pointer: 0x1015ec130>, strike, data)
This shared library is only available on process that compiled the function.
Hence, why it is always ideal to embed compiled code within a package and then distribute the package.
Upvotes: 8