s.brunel
s.brunel

Reputation: 1043

rjdbc Parallel query with parallelmap

I'm trying to run my query in parallele and i get an 00001: Error in .jcheck() : No running JVM detected. Maybe .jinit() would help. error. The queries are working when i run them one by one

My script:

I know it's not really reproductible but i can't give give you my log/pass :)

i tried to .jinit() and Sys.setenv(JAVA_HOME='C:\\Program Files\\Java\\jdk1.8.0_102') in the slave it's not working

library(RJDBC)
library(parallelemap)

jdbcDriver <- JDBC(driverClass="oracle.jdbc.OracleDriver",  classPath="ojdbc6.jar" )
jdbcConnection <- dbConnect(jdbcDriver, "jdbc:oracle:thin:@//mybase", "login", "pass")

query_list<- list( "SELECT * FROM table1",
                   "SELECT * FROM table2",
                   "SELECT * FROM table3",
                   "SELECT * FROM table4", 
                   "SELECT * FROM table5")


 import_base_fonction <- function(query) {return(dbGetQuery( jdbcConnection , query))}


parallelStartSocket( 5 ) 

parallelLibrary("RJDBC","rJava")
parallelExport("listquery_list","import_base_fonction" ,"jdbcConnection")

mes_tables <- parallelMap(import_base_fonction,query_list)  

parallelStop() 

my session info

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] parallelMap_1.3      PhViD_1.0.8          MCMCpack_1.4-0       MASS_7.3-47          coda_0.19-1          LBE_1.44.0           dplyr_0.7.1         
 [8] plyr_1.8.4           shiny_1.0.3          DT_0.2               shinydashboard_0.6.1 data.table_1.10.4    RJDBC_0.2-5          rJava_0.9-8         
[15] DBI_0.7             

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11       compiler_3.4.1     bindr_0.1          tools_3.4.1        digest_0.6.12      checkmate_1.8.3    tibble_1.3.3       lattice_0.20-35   
 [9] pkgconfig_2.0.1    rlang_0.1.1        Matrix_1.2-10      parallel_3.4.1     SparseM_1.77       bindrcpp_0.2       htmlwidgets_0.9    MatrixModels_0.4-1
[17] grid_3.4.1         glue_1.1.1         R6_2.2.2           magrittr_1.5       backports_1.1.0    BBmisc_1.11        htmltools_0.3.6    mcmc_0.9-5        
[25] assertthat_0.2.0   mime_0.5           xtable_1.8-2       httpuv_1.3.5       quantreg_5.33    

The base is on Oracle 11.xx server.

Please guide.

Upvotes: 15

Views: 1048

Answers (2)

Sfroehlich
Sfroehlich

Reputation: 21

In general, you have to stand up a new database connection for each process as the actual network connection does not get passed to the child workers.

So your function should look like this:

import_base_fonction <-
   function(query) {
      jdbc_conn <- dbConnect(jdbcDriver, "jdbc:oracle:thin:@//mybase", "login", "pass")
      query_output <- dbGetQuery(jdbc_conn, query)
      dbDisconnect(jdbc_conn)
      return(query_output)
   }

If you're doing hundreds of logins, then you may need to go to the original parallel library directly and log in each worker just once using parallel::clusterEvalQ() before running the parallel job on each client using parallel::parLapplyLB() or something similar. In this case, its also best practice to run parallel::clusterEvalQ(cl, {dbDisconnect(jdbc_conn)}) before you run parallel::stopCluster().

Upvotes: 0

aristotll
aristotll

Reputation: 9177

I think you can change import_base_fonction to

import_base_fonction <- function(query) {
  .jinit("ojdbc6.jar")
  return(dbGetQuery( jdbcConnection , query))
 }

Upvotes: 0

Related Questions