Ray not distributing workers evenly amongst machines

Question

I'm trying to setup a Ray cluster for parallel processing, I have 3 on-premise machines each with 12 CPUs, and each actor is assigned 1 CPU. I'm deploying the head manually with:

ray start --head --port=... --redis-shard-ports=... --node-manager-port=... --object-manager-port=... --min-worker-port=... --max-worker-port=... --ray-client-server-port=... --gcs-server-port=... --num-cpus=12

and each worker with:

ray start --address='' --redis-password='...' --node-manager-port=... --object-manager-port=...  --min-worker-port=... --max-worker-port=... --dashboard-port=... --gcs-server-port=... --num-cpus=12

Each worker uses a hefty amount of memory, the issue is that Ray keeps assigning workers to the head node until it runs out of memory and crashes, meanwhile the other worker nodes aren't utilized.

Ray not distributing workers evenly amongst machines

Answers (1)

Related Questions