Marcos Heidemann
Marcos Heidemann

Reputation: 21

GrpcUnavailable when setting up a local Ray Cluster

When I connect a node to my head it gives me this message:

(gcs_server) gcs_server.cc:283: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details:

I'm using ray 1.13.0

My head node is running on a WSL2 instance with all possible ports open and forward to it.

ray start --head --port 6379 --ray-client-server-port 10001 --redis-shard-ports 20000,20001,20002,20003,20004 --dashboard-port 8265 --node-manager-port 6380 --object-manager-port 6381 --worker-port-list=10000,10002,10003,10004 --dashboard-host 0.0.0.0

On another pc running windows (also happens with another one running Ubuntu 20.04) i'm able to start ray with:

ray start --address='MY_IP:6379' And it gives me: Ray runtime started.

Now, as soon as I connect a node, it starts giving me the message:

(gcs_server) gcs_server.cc:283: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details:

and if I try to run ray memory I get:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1660685529.361531380","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3134,"referenced_errors":[{"created":"@1660685529.361530960","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"

I've been at this for days and nothing I find on google makes any difference

Upvotes: 2

Views: 1280

Answers (1)

Raphael Muema
Raphael Muema

Reputation: 11

I was having a similar issue and I managed to solve my issue by specifying the node public ip address. On the node try a similar command to the following:
ray start --address='HEAD_IP:6379' --node-ip-address='NODE_PUBLIC_IP'

If you don't specify the node-ip-address then it registers itself as 127.0.0.1 and then the head node is unable to communicate with it.

Hope this helps.

Upvotes: 1

Related Questions