Reputation: 484
I'm trying to run R in parallel which works perfectly on the localhost. Now I want to switch to multinode setup and created several virtual machines in the same network. However, when I'm trying to set up the cluster, it fails with the following error:
Error in socketConnection(master, port = port, blocking = TRUE, open = "a+b", :
cannot open the connection
Calls: <Anonymous> ... doTryCatch -> recvData -> makeSOCKmaster ->
socketConnection
In addition: Warning message:
In socketConnection(master, port = port, blocking = TRUE, open = "a+b", :
ubuntu-r-node1:11056 cannot be opened
Minimal reproducible example:
library("parallel")
cl <- makeCluster(c(rep("192.168.42.26",2),rep("192.168.42.32",2)),outfile = "")
I have also tried just opening the socket on localhost and it fails as well (but cluster on localhost only works), with the same error message:
socketConnection("localhost", port = 11056, blocking = TRUE, open = "a+b")
Only if I add server = TRUE option, socketConnection works, but I'm not sure if this option is appropriate for makeCluster and how to set it.
I have fresh install of Ubuntu Server 16.04, iptables rules empty (ACCEPT all), ssh works both directions, so I have no idea why it does not work.
Upvotes: 1
Views: 5129
Reputation: 6805
If there is a firewall issue involved here, then as an alternative to:
library("parallel")
workers <- c(rep("192.168.42.26",2), rep("192.168.42.32",2))
cl <- makeCluster(workers, outfile = "")
which is equivalent to:
cl <- makePSOCKcluster(workers, outfile = "")
you could try to use:
library("future")
cl <- makeClusterPSOCK(workers, revtunnel = TRUE, outfile = "", verbose = TRUE)
The latter will setup a so called reverse SSH-tunnel, which will be an "internal" part of the outgoing SSH connection from master to worker. If the firewall prevents the workers from connecting back to master parallel::makePSOCKcluster()
, for instance, because the port range is blocked, then future::makeClusterPSOCK(..., revtunnel = TRUE)
works around that problem. The verbose=TRUE
output should show something like:
Starting worker #1 on '192.168.42.26': 'ssh' -R 11356:localhost:11356 192.168.42.26 "'Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11356 OUT= TIMEOUT=2592000 XDR=TRUE"
Waiting for worker #1 on '192.168.42.26' to connect back
Connection with worker #1 on '192.168.42.26' established
[...]
What this shows is that, as far as this worker 192.168.42.26
knows, it is connecting back to the master process that it thinks run on the same machine (MASTER=localhost:11356
), which happens because the reverse SSH tunnel (-R 11356:localhost:11356
) maps the port from that machine back to the master via the SSH connection.
If this reverse tunneling approach doesn't work for you, I think you have to ask your sysadm for more details on what ports are blocked etc.
I hope this makes sense.
Upvotes: 2
Reputation: 19677
The socketConnection
error is happening when a worker tries to connect to the master process, probably because at least one of the workers can't resolve the master's hostname, which is "ubuntu-r-node1" in your example. The master's hostname is determined using Sys.info()['nodename']
by default, and if any of the workers can't resolve this name, they won't be able to create the socket connection to the master, and makeCluster
will hang.
A common work-around for this problem is to use the makeCluster
"master" option to specify the IP address of the machine where the master is executing. Here's a way to do that using the nsl
function (which is not available on Windows) to look up the master's hostname on the master rather than the workers:
cl <- makePSOCKcluster(c(rep('192.168.42.26', 2),
rep('192.168.42.32', 2)),
master=nsl(Sys.info()['nodename']),
outfile='')
By specifying IP addresses for both the workers and the master, you have much less problems with DNS issues. In this example, the master will start the workers by ssh'ing to '192.168.42.26' and '192.168.42.32', and the workers will connect back to the master using socketConnection
with the value returned by nsl(Sys.info()['nodename'])
.
Note that the makeCluster
"port" option can also be important if the master has a firewall, since by default, the port is randomly chosen in the range 11000 to 11999.
Upvotes: 1
Reputation: 484
It seems, that DNS should also work in both direction.
E.g, if first host (192.168.42.26) in my example would have a name 'host1' and the second host(192.168.42.32) 'host2', then both
ssh host1
(from host2)
and
ssh host2
(from host1)
should work to run R cluster.
Upvotes: 0