mauna
mauna

Reputation: 1118

What are the caveats for calling other packages inside doMC's foreach and dopar?

This code works as expected:

library(dplyr)
data <- list(t1 = "hello world.", t2 = "bye world")

library(doMC)
registerDoMC(3)

res <- foreach(t = data) %dopar% {

    print(sprintf("processing %s", t))

    data.frame(text = t) %>%
    dplyr::count(text)

}

print(res)

However, this code just prints "processing hello world." and "processing bye world" and then just hangs (no exceptions thrown).

library(dplyr)
coreNLP::initCoreNLP()

data <- list(t1 = "hello world.", t2 = "bye world")

library(doMC)
registerDoMC(3)

res <- foreach(t = data) %dopar% {

    print(sprintf("processing %s", t))

    coreNLP::annotateString(t)$token

}

print(res)

The code above will work as expected if I change %dopar% to %do%.

I do not understand what is causing this behavior. Why does calling coreNLP functions inside %dopar%causes R to hang but works fine with other packages? Does this have something to do with coreNLP's dependency on Java?

Here's the output of sessionInfo():

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.0

Upvotes: 0

Views: 115

Answers (1)

HenrikB
HenrikB

Reputation: 6815

Your first example works just fine for me on what looks like a similar setup. My session info after running the example is below; make sure to try again with a fresh R session (R --vanilla). I have four cores (from parallel::detectCores()).

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] doMC_1.3.4      iterators_1.0.8 foreach_1.4.3   dplyr_0.5.0    

loaded via a namespace (and not attached):
[1] compiler_3.4.0   magrittr_1.5     R6_2.2.0         assertthat_0.2.0
[5] DBI_0.6-1        tibble_1.3.0     Rcpp_0.12.10     codetools_0.2-15

Your second example does not work for me either. The output is as below. My guess is that the forked processes can not share the same underlying Java process/service that coreNLP relies on; don't really know coreNLP.

> res <- foreach(t = data) %dopar% {
+ 
+     print(sprintf("processing %s", t))
+ 
+     coreNLP::annotateString(t)$token
+ 
+ }
[1] "processing hello world."
[1] "processing bye world"


^CError in selectChildren(ac, 1) : 
  Java called System.exit(130) requesting R to quit - trying to recover
Error during wrapup: C stack usage  591577121812 is too close to the limit

 *** caught segfault ***
address 0x2, cause 'memory not mapped'

Upvotes: 1

Related Questions