Stefano Barone
Stefano Barone

Reputation: 221

Workers definition in DASK python library, why more workers than cpu cores

I'm really confused about what a worker is.

In general I would say a node in a dask cluster which can compute tasks according with directives of the scheduler. However, I thought that a single node could be a cpu core and the number of threads per worker at most the number of thread per cpu core. Working on a single machine I can set a number of workers grater than the CPU cores present in my laptop and a number of threads per worker larger than a number of thread per cpu core.

So what is actually a worker when I set a local cluster?

It refers to something physical on my machine?

Why no error comes out?

enter image description here

Upvotes: 1

Views: 1380

Answers (1)

mdurant
mdurant

Reputation: 28684

You can have as many threads running on your system as you like - because you have a modern multitasking operating system. The OS takes care of waking threads and running them in the cores of your CPU, and in your case, at most four threads can be running simultaneously. Therefore, it is probably not in your interests to have more than four dask worker threads in total.

You can choose how many workers (read: processes) and threads are appropriate for your application, where processes are not mutually blocked by the GIL, but threads can efficiently share memory.

Upvotes: 4

Related Questions