Reputation: 8915
Can dask distributed handle uneven worker nodes?
For example, if there is a dask worker on a 4 core computer and a second dask worker on a 2 core computer, will all 6 cores to be utilised?
Also is it a strict requirement for dask to distribute the work amongst all the computers? That is, can dask choose to send all the work to one computer because it determines that there would be too much communication overhead if distributed?
Upvotes: 2
Views: 937
Reputation: 57271
Can dask distributed handle uneven worker nodes?
Yes, nodes can be uneven in number of cores, amount of memory, or even have special hardware like GPUs and there are mechanisms within Dask to handle this.
For example, if there is a dask worker on a 4 core computer and a second dask worker on a 2 core computer, will all 6 cores to be utilised?
Yes, the Dask scheduler will automatically load balance relative to the number of cores of each machine. If for some reason this were to misbehave (for example the number of cores was incorrect) then the work stealing mechanism would balance it out anyway.
Also is it a strict requirement for dask to distribute the work amongst all the computers? That is, can dask choose to send all the work to one computer because it determines that there would be too much communication overhead if distributed?
The Dask scheduler maintains the size of every intermediate result and decides to move data to certain machines with those sizes and expected runtimes in mind. There are certainly cases where Dask will decide that workers should remain idle because intermediate results are too expensive to communicate.
If you desire you can also control this manually (though the automatic heuristics should be fine). See http://distributed.readthedocs.io/en/latest/locality.html
There is more information on this topic at http://distributed.readthedocs.io/en/latest/scheduling-policies.html
Upvotes: 4