Andrew Reid
Andrew Reid

Reputation: 88

Setting up a dask distributed scheduler on two IP addresses?

I am new to dask, and have what appears to be a bit of an odd use-case, where I would like to set up a dask scheduler on a "bridging" machine with two network interfaces, such that clients can connect to one of the interfaces (the "front"), and workers will live on multiple machines connected to another interface (the "back"). The interfaces have separate IP addresses and hostnames.

Essentially, I want to do this picture where the brown and blue pieces have no route between them, except through the machine with the scheduler on it. (The picture is from some old documentation for dask distributed, 0.7 I think, when things were apparently less settled than now.)

Everything is 64-bit Linux (Debian 8 "jessie"), and I'm working with version 0.14.0 of dask and 1.16.0 of distributed, installed in an anaconda environment.

The dask-scheduler command-line tool does not seem to have a way to do more than one hostname, which I think is what I want.

I can get the effect I want by SSH port-forwarding.

For example, suppose the relevant interfaces are machines worker, scheduler-front, scheduler-back, and client. The two scheduler-* interfaces are different NICs on the same machine, and there is a TCP route from client to scheduler-front, and one from scheduler-back to worker, but there is no route from client to worker, from scheduler-front to worker, or from scheduler-back to client.

Then, the following works (the leading bit below is meant to be a command-line prompt indicating which machine the command is run on, with '#' meaning the shell, and '>>>' meaning Python):

First, start a scheduler listening on the "back" of the bridge host:

scheduler# dask-scheduler --host schedular-back

Second, start a worker and connect it to the scheduler in the ordinary way:

worker# dask-worker scheduler-back:8786

Third, forward localhost:8786 on the client to scheduler-back:8786 on the scheduler machine, ssh-ing in through the scheduler-front interface:

client# ssh -L 8786:scheduler-back:8786 scheduler-front

Finally, start up the client on the client machine, and connect to the near end of the forwarded port whose other end can see the scheduler.

client>>> from distributed import Client
client>>> cl = Client('127.0.0.1:8786')
client>>> ...

As I say, this works, I can do maps and gathers and get results.

But I can't help thinking that I'm over-doing it, and maybe I missed something simple that allows multi-homed schedulers. Private sub-nets aren't all that strange, they come up in the context of containers and clusters.

Is there a smarter way to do this?

In case it's of interest, the reason for not using the cluster queuing system is that the target "worker" machine is the one with a GPU, and we are having some difficulty getting the queuing system to allocate it properly, so at the moment, that machine is working outside of the queuing system. We will eventually solve that problem, but for now, we're trying to do this.

Also, for completeness, the reason for not having the client be on the scheduler machine is that, in our scenario, the client needs to do visualizations, and the scheduler is a cluster head-node that's in rack in the machine room and is not physically accessible to users.

Upvotes: 3

Views: 1825

Answers (1)

Antoine P.
Antoine P.

Reputation: 4315

If you don't specify any --host to dask-scheduler, it will listen on all interfaces by default. For instance:

$ dask-scheduler 
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO -   Scheduler at:   tcp://192.168.1.68:8786
distributed.scheduler - INFO -        http at:              0.0.0.0:9786
distributed.scheduler - INFO -       bokeh at:              0.0.0.0:8788
distributed.bokeh.application - INFO - Web UI: http://127.0.0.1:8787/status/
distributed.scheduler - INFO - -----------------------------------------------

and:

$ netstat -tnlp | \grep 8786
tcp        0      0 0.0.0.0:8786            0.0.0.0:*               LISTEN      23969/python    
tcp6       0      0 :::8786                 :::*                    LISTEN      23969/python  

So you can then connect from the subnetwork you want, using the right IP (v4 or v6) address to contact the scheduler. Your workers might use tcp://192.168.1.68:8786 and your clients tcp://10.1.2.3:8786, for instance.

If you're willing to listen on more than one interface, but not all of them, however, this is not possible currently.

Upvotes: 1

Related Questions