Arpit
Arpit

Reputation: 53

Error in parallel processing with foreach: "could not find function "%dopar%""

I am having issues with parallel processing using the foreach function in R.

The following code works perfectly:

library(foreach)
library(doParallel)

city_list <- c("city1", "city2")
date_list <- c("date1", "date2")

city_date_list <- foreach(city=city_list, .combine='c') %do% {
  foreach(date = date_list, .combine='c') %do% {
  city_date <- paste(city, date)
  city_date
}
}
print(city_date_list)

[1] "city1 date1" "city1 date2" "city2 date1" "city2 date2"

However, when I try changing from %do% to %dopar%, the code starts throwing up errors. This is the updated code for parallel processing

library(foreach)
library(doParallel)

city_list <- c("city1", "city2")
date_list <- c("date1", "date2")
myCluster <- makeCluster(4, type="PSOCK")
registerDoParallel(myCluster)


city_date_list <- foreach(city=city_list, .combine='c') %dopar% {
  foreach(date = date_list, .combine='c') %dopar% {
    city_date <- paste(city, date)
    city_date
  }
}

stopCluster(myCluster)

print(city_date_list)

This is the output generated

> city_date_list <- foreach(city=city_list, .combine='c') %dopar% {
+   foreach(date = date_list, .combine='c') %dopar% {
+     city_date <- paste(city, date)
+     city_date
+   }
+ }
Error in { : task 1 failed - "could not find function "%dopar%""
> 
> stopCluster(myCluster)
> 
> print(city_date_list)
Error in print(city_date_list) : object 'city_date_list' not found

I am not sure what the error is. These are the details of the session I am running.

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doParallel_1.0.11 iterators_1.0.9   foreach_1.4.4    

loaded via a namespace (and not attached):
[1] compiler_3.4.3   magrittr_1.5     tools_3.4.3      yaml_2.1.18      stringi_1.1.7    codetools_0.2-15 knitr_1.20      
[8] stringr_1.3.0   

Any ideas on how to rectify this?

Upvotes: 2

Views: 1336

Answers (2)

Ralf Stubner
Ralf Stubner

Reputation: 26823

When nesting foreach loops you should use the nesting operator %:% on all but one loop. Pseudo code:

foreach (...) %:%
    foreach (...) %dopar%
        ....

See the nesting vignette for details.

Upvotes: 2

loki
loki

Reputation: 10340

When you want to use additional libraries in the foreach-loop, you have to export them to the parallel cluster nodes. Therefore, you have to use the .export parameter in your foreach function call:

city_date_list <- foreach(city=city_list, .combine='c', 
                          .packages = c("foreach") # this does the trick
                          ) %dopar% {
  foreach(date = date_list, .combine='c') %dopar% {
    city_date <- paste(city, date)
    city_date
  }
}

As we learn from ?foreach:

.packages --> character vector of packages that the tasks depend on. If ex requires a R package to be loaded, this option can be used to load that package on each of the workers. Ignored when used with %do%.

Thus, this does not occur in your first example but the second (using parallel).

However, I'm not quite sure if a nested foreach is necessary / useful.

Upvotes: 1

Related Questions