VMois
VMois

Reputation: 4698

Running process scheduler in Dask distributed

Local dask allows using process scheduler. Workers in dask distributed are using ThreadPoolExecutor to compute tasks. Is it possible to replace ThreadPoolExecutor with ProcessPoolExecutor in dask distributed? Thanks.

Upvotes: 1

Views: 515

Answers (1)

mdurant
mdurant

Reputation: 28673

The distributed scheduler allows you to work with any number of processes, via any of the deployment options. Each of these can have one or more threads. Thus, you have the flexibility to choose your favourite mix of threads and processes as you see fit.

The simplest expression of this is with the LocalCluster (same as Client() by default):

cluster = LocalCluster(n_workers=W, threads_per_worker=T, processes=True)

makes W workers with T threads each (which can be 1).

As things stand, the implementation of workers uses a thread pool internally, and you cannot swap in a process pool in its place.

Upvotes: 1

Related Questions