Brandon Bertelsen
Brandon Bertelsen

Reputation: 44648

foreach with database connection freezes with no error, forever

library(doParallel)
library(RMySQL)

no_cores <- as.integer(system('getconf _NPROCESSORS_ONLN', intern = TRUE)) - 1
cluster <- makeCluster(no_cores)
registerDoParallel(cl)

clusterEvalQ(
  cluster, 
  mysql <- RMySQL::dbConnect(...)
  }
)

r <- foreach(i = 1:50, .verbose = TRUE) %dopar% { dbGetQuery(mysql, 'show tables;')}

no variables are automatically exported

There's no error, no complaint. Nothing, it just freezes. I can start and use a cluster without database connections.

Thoughts?

Upvotes: 1

Views: 587

Answers (1)

Steve Weston
Steve Weston

Reputation: 19677

When does it hang? When calling clusterEvalQ or the foreach loop?

I have a few suggestions:

  • Use outfile="" when creating the cluster to get debug output;
  • Load RMySQL when initializing the cluster;
  • Return NULL from clusterEvalQ to avoid serializing connection objects;
  • Make sure you call registerDoParallel so the tasks aren't executed locally.

Here's a test that uses these suggestions:

library(doParallel)
cl <- makePSOCKcluster(3, outfile="")
registerDoParallel(cl)

clusterEvalQ(cl, {
  library(RMySQL)
  mysql <- dbConnect(MySQL(), user='root',
                     password='notmypasswd', dbname='mysql')
  NULL
})

r <-
  foreach(i=1:50, .verbose=TRUE) %dopar% {
    dbGetQuery(mysql, 'show tables;')
  }

This test works for me. When I run it, I see messages like:

no variables are automatically exported
numValues: 50, numResults: 0, stopped: TRUE
got results for task 1
numValues: 50, numResults: 1, stopped: TRUE
returning status FALSE
got results for task 2

If you only see:

no variables are automatically exported

and then it hangs, then the workers are presumably hanging trying to perform the query using the database connection. That sounds like a MySQL problem to me, but I'm not a MySQL expert.

Upvotes: 5

Related Questions