Dask parallelize a short task that uses a large np.ndarray

Question

I have a function f that uses as input a variable x which is a large np.ndarray (lenght 20000).
Execution of f takes very little (about 5ms).

A for loop over a matrix M with many rows

for x in M:
    f(x)

takes about 5 times longer than parallelizing using multiprocessing

import multiprocessing

with multiprocessing.Pool() as pool:
    pool.map(f, M)

I have tried to parallelize with dask but it loses even against sequential execution. Related post is here but the accepted answer doesn´t work for me. I have tried many thing like use partitions of the data as the best practices say or using dask.bag. I'm running Dask in local machine with 4 physical cores.

So the question is how to use dask with short tasks that take large data as input?

Dask parallelize a short task that uses a large np.ndarray

Answers (1)

Related Questions