Reputation: 53
I am having issues with parallel processing using the foreach function in R.
The following code works perfectly:
library(foreach)
library(doParallel)
city_list <- c("city1", "city2")
date_list <- c("date1", "date2")
city_date_list <- foreach(city=city_list, .combine='c') %do% {
foreach(date = date_list, .combine='c') %do% {
city_date <- paste(city, date)
city_date
}
}
print(city_date_list)
[1] "city1 date1" "city1 date2" "city2 date1" "city2 date2"
However, when I try changing from %do% to %dopar%, the code starts throwing up errors. This is the updated code for parallel processing
library(foreach)
library(doParallel)
city_list <- c("city1", "city2")
date_list <- c("date1", "date2")
myCluster <- makeCluster(4, type="PSOCK")
registerDoParallel(myCluster)
city_date_list <- foreach(city=city_list, .combine='c') %dopar% {
foreach(date = date_list, .combine='c') %dopar% {
city_date <- paste(city, date)
city_date
}
}
stopCluster(myCluster)
print(city_date_list)
This is the output generated
> city_date_list <- foreach(city=city_list, .combine='c') %dopar% {
+ foreach(date = date_list, .combine='c') %dopar% {
+ city_date <- paste(city, date)
+ city_date
+ }
+ }
Error in { : task 1 failed - "could not find function "%dopar%""
>
> stopCluster(myCluster)
>
> print(city_date_list)
Error in print(city_date_list) : object 'city_date_list' not found
I am not sure what the error is. These are the details of the session I am running.
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C LC_TIME=English_India.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doParallel_1.0.11 iterators_1.0.9 foreach_1.4.4
loaded via a namespace (and not attached):
[1] compiler_3.4.3 magrittr_1.5 tools_3.4.3 yaml_2.1.18 stringi_1.1.7 codetools_0.2-15 knitr_1.20
[8] stringr_1.3.0
Any ideas on how to rectify this?
Upvotes: 2
Views: 1336
Reputation: 26823
When nesting foreach
loops you should use the nesting operator %:%
on all but one loop. Pseudo code:
foreach (...) %:%
foreach (...) %dopar%
....
See the nesting vignette for details.
Upvotes: 2
Reputation: 10340
When you want to use additional libraries in the foreach
-loop, you have to export them to the parallel cluster nodes. Therefore, you have to use the .export
parameter in your foreach
function call:
city_date_list <- foreach(city=city_list, .combine='c',
.packages = c("foreach") # this does the trick
) %dopar% {
foreach(date = date_list, .combine='c') %dopar% {
city_date <- paste(city, date)
city_date
}
}
As we learn from ?foreach
:
.packages
--> character vector of packages that the tasks depend on. If ex requires a R package to be loaded, this option can be used to load that package on each of the workers. Ignored when used with%do%
.
Thus, this does not occur in your first example but the second (using parallel).
However, I'm not quite sure if a nested foreach
is necessary / useful.
Upvotes: 1