doMPI and user-defined packages

I am starting to use doMPI, and I user a package I have defined by myself.

First, I have in my file to be executed:

library(doMPI)
cl <- startMPIcluster() 
registerDoMPI(cl)

Note: I am not using cl <- startMPIcluster(count), since I believe it is better to specify the number of cores "outside" the function, but it is not clear to me if doing cl <- startMPIcluster() is the right way or not.

And then, after loading my package through library(my_package)

myres <- foreach(t2 = 1550:1551) %dopar% {my_function(t2)}

using mpirun has resulted in

Evaluation error: could not find function "my_function"

But the function my_function is recognized in the cluster: when I run my_function without MPI, the results are correct.

Then, I have changed the call to:

myres <- foreach(t2 = 1550:1551) %dopar% {my_package::my_function(t2)}

and then it has started to run, so adding my_package:: has allowed mpirun to understand what my_function is. Which is something odd, since I have run before library(my_package).

But after starting to run, there is another error:

"Evaluation error: could not find function "my_function_2"."

my_function_2 is a defined function on my package.

Of course, one possibility would be to go to all the functions in my package, and when there is a call to another function, just add my_package::.

But I believe that this is not what it should be, so I guess that there is an underlying error/badly-used-instruction that I cannot see.

Any idea on what could be going on wrong? Thank you in advance.

Upvotes: 0

Views: 98

Answers (1)

Steve Weston
Steve Weston

Reputation: 19677

You should initialize the workers using the foreach .packages option:

myres <-
  foreach(t2=1550:1551, .packages='my_package') %dopar% {
    my_function(t2)
  }

This causes each of the cluster workers to load my_package. Loading a package in the master process doesn't cause it to be loaded by the cluster workers, which is why .packages is necessary.

As for startMPIcluster, I never use the count argument unless I'm executing the R script without mpirun (which limits you to running on a single node). If count isn't specified, startMPIcluster gets all it's information from mpirun, making the script more flexible.

Upvotes: 0

Related Questions