are there any limits on number of the dask workers/cores/threads?

Question

I am seeing some performance degradation on my data analysis when I go more than 25 workers, each with 192 threads. Are there any limits on scheduler? there is no load footprint on communication(ib is used) or cpu or ram). for example initially I have 170K hdf files on the lustrefs:

ddf=dd.read_hdf(hdf5files,key="G18",mode="r")
ddf.repartition(npartitions=4096).to_parquet(splitspath+"gdr3-input-cache")

the code is running slower on 64 workers than 25. looks like the scheduler on initial tasks design phase is very overloaded.

EDIT: dask-2021.06.0 distributed-2021.06.0

mdurant · Accepted Answer

There are many potential bottlenecks. Here are some hints.

Yes, the scheduler is a single process through which all tasks must pass, and it introduces an overhead per task (<1ms) just to manipulate its internal state and send . So, if you have many tasks per second, you will see the overhead take a larger fraction of the total time.

Similarly, if you have a lot of workers, you will have a lot of network traffic for both distribution of tasks and any data shuffling between workers. More workers, more traffic.

Thirdly, python uses a global lock, the GIL, when running code. Even when your tasks are GIL-friendly (e.g., array/dataframe ops), threads may still need the GIL sometimes, and this can cause contention and degraded performance.

Finally, you say you are using lustre, so you have many tasks simultaneously hitting network storage, which will have its own limitations both for metadata access and for data traffic.

are there any limits on number of the dask workers/cores/threads?

Answers (1)

Related Questions