Dominik
Dominik

Reputation: 792

Setup torque/moab cluster to use multiple cores per node with a single loop

This is a followup on [How to set up doSNOW and SOCK cluster with Torque/MOAB scheduler?

I have a memory limited script that only uses 1 foreach loop but I'd like to get 2 iterations running on node1 and 2 iterations running on node2. The above linked question allows you to start a SOCK cluster to each node for the outer loop and then MC cluster for the inner loop and I think doesn't make use of the multiple cores on each node. I get the warning message Warning message: closing unused connection 3 (<-compute-1-30.local:11880)

if I do registerDoMC(2) if I do this after registerDoSNOW(cl) Thanks.

EDIT: The solution from the previous question works fine for the problem asked. see my example below for what I want.

starting an interactive job with 2 nodes and 2cores per processor:

qsub -I -l nodes=2:ppn=2

after starting R:

library(doParallel)
f <- Sys.getenv('PBS_NODEFILE')
nodes <- unique(if (nzchar(f)) readLines(f) else 'localhost')
print(nodes)

here are the two nodes I"m running on:

[1] "compute-3-15" "compute-1-32"

start the sock cluster on these two nodes:

cl <- makePSOCKcluster(nodes, outfile='')

i'm not sure why they both seem to be on compute-3-15 .... ?

starting worker pid=25473 on compute-3-15.local:11708 at 16:54:17.048
starting worker pid=14746 on compute-3-15.local:11708 at 16:54:17.523

but register the two nodes and run a single foreach loop:

registerDoParallel(cl)
r=foreach(i=seq(1,6),.combine='c') %dopar% { Sys.info()[['nodename']]}
print(r)

output of r indicates that both nodes were used though:

 [1] "compute-3-15.local" "compute-1-32.local" "compute-3-15.local"
 [4] "compute-1-32.local" "compute-3-15.local" "compute-3-15.local"

now, what I'd really like is for that foreach loop to run on 4 cores, 2 on each node.

library(doMC)
registerDoMC(4)
r=foreach(i=seq(1,6),.combine='c') %dopar% { Sys.info()[['nodename']]}
print(r)

the output indicates that only 1 node was used, but presumably both cores on that one node.

[1] "compute-3-15.local" "compute-3-15.local" "compute-3-15.local"
[4] "compute-3-15.local" "compute-3-15.local" "compute-3-15.local"

How do I get a SINGLE foreach loop to use multiple cores on multiple nodes?

Upvotes: 2

Views: 831

Answers (1)

Steve Weston
Steve Weston

Reputation: 19677

In order to use multiple nodes with foreach/doParallel, you specify a vector of hostnames when calling makePSOCKcluster. If you want to use multiple cores on those hosts, you simply specify the hostnames multiple times so that makePSOCKcluster will start multiple workers per host.

Since you're using the Torque resource manager, you could use the following function to generate the node list which can limit the maximum number of workers started on any of the nodes:

getnodelist <- function(maxpernode=100) {
  f <- Sys.getenv('PBS_NODEFILE')
  x <- if (nzchar(f)) readLines(f) else rep('localhost', 3)
  d <- as.data.frame(table(x), stringsAsFactors=FALSE)
  rep(d$x, pmin(d$Freq, maxpernode))
}

Here's an example that uses this function to run no more than two workers on each node that was allocated by Torque:

library(doParallel)
nodelist <- getnodelist(2)
print(nodelist)
cl <- makePSOCKcluster(nodelist, outfile='')
registerDoParallel(cl)
r <- foreach(i=seq_along(nodelist), .combine='c') %dopar% {
  Sys.info()[['nodename']]
}
cat('results:\n')
print(r)

Note that you cannot use the doMC backend to execute tasks on multiple nodes, since doMC uses the mclapply function which can only create workers on the local machine. To use multiple nodes, you have to use a backend such as doParallel, doSNOW, or doMPI.

Upvotes: 2

Related Questions