Reputation: 189
UPDATE I have copied over the SSH keys to all my machines and they are able to communicate without a password, however I still need to specify the username@hostname instead of just the hostname. I tried many different methods with no luck: Method 1: I input the following on my jupyter notebook:
from dask.distributed import Client, SSHCluster
cluster = SSHCluster(
["localhost", username@hostname],
connect_options={"known_hosts": None},
worker_options={"nthreads": 2},
)
client = Client(cluster)
I understand the connect_options is what the asyncio library does to connect ssh so I thought known_hosts is ok since it looks like the authorized keys in my .ssh directory. However, I keep getting the following error:
~/anaconda3/lib/python3.7/concurrent/futures/thread.py in run(self)
55
56 try:
57 result = self.fn(*self.args, **self.kwargs)
58 except BaseException as exc:
59 self.future.set_exception(exc)
~/anaconda3/lib/python3.7/socket.py in getaddrinfo(host, port, family, type,
proto, flags)
750 # and socket type values to enum constants.
751 addrlist = []
--> 752 for res in _socket.getaddrinfo(host, port, family, type, proto,
flags):
753 af, socktype, proto, canonname, sa = res
754 addrlist.append((_intenum_converter(af, AddressFamily),
gaierror: [Errno -2] Name or service not known
The second method I tried was dask-ssh which I typed the following in the command line:
dask-ssh localhost username@hostname username@hostnameb --nprocs 10
However, when I open the dashboard I don't see anything from the worker nodes from the remote machines in the dashboard, only the 10 workers from the localhost.
Please help, I read tutorials, looked at Stack Overflow, I even tried Kubernetes (microk8s, k3s, minikube, kubeadm) and Apache Hadoop/Yarn with many many hours failed results and dask ssh seems to be my only hope. I also like Dask because the dashboard looks better than Hadoop (that yellow elephant kinda bugs me).
PREVIOUS I'm trying to create a DASK cluster between my machines at home using Jupyter Notebook. I understand the concept behind schedulers, workers and clients. On the Dask Docs, they provided the following example which I'm having a hard time figuring out how to make it work:
from dask.distributed import Client,SSHCluster
cluster = SSHCluster(
[["localhost", "localhost", "localhost", "localhost"],
connect_options={"known_hosts": None},
worker_options={"nthreads": 2},
scheduler_options={"port": 0, "dashboard_address": ":8797"}
client = Client(cluster)
My question is how do I configure SSHCluster so I can create a cluster between different machines. How do I set the IP address, username, and password? I understand there are better options out there like Hadoop/Yarn, Kubernetes etc, but I wanted to understand the SSH cluster concept through Jupyter Notebook.
Thanks,
Upvotes: 1
Views: 2133
Reputation: 28673
The documentation tells you what to do.
How do I set the IP address, username, and password?
SSH cluster concept through Jupyter Notebook
Use of a notebook is immaterial here, you are executing python just the same.
there are better options out there like Hadoop/Yarn, Kubernetes
Many many people use SSH, because it is very simple, but it does leave you to manage any orchestration (e.g., making sure machines are on the same network and can communicate, and managing environments).
-EDIT-
(to updated question)
Reading the asyncSSH documentation, you want to pass an option called username=
on connect_option
(see here). asyncssh does not currently support using a ~/.ssh config file to define targets, unfortunately, so if you have different options for each server, you are out of luck.
Note that if you are doing something very custom, you do not need to use dask-ssh at all, you can login and run dask explicitly on each server.
Upvotes: 2