Reputation: 783
If I try to parallelize a for loop with dask, it ends up executing slower than the regular version. Basically, I just follow the introductory example from the dask tutorial, but for some reason it's failing on my end. What am I doing wrong?
In [1]: import numpy as np
...: from dask import delayed, compute
...: import dask.multiprocessing
In [2]: a10e4 = np.random.rand(10000, 11).astype(np.float16)
...: b10e4 = np.random.rand(10000, 11).astype(np.float16)
In [3]: def subtract(a, b):
...: return a - b
In [4]: %%timeit
...: results = [subtract(a10e4, b10e4[index]) for index in range(len(b10e4))]
1 loop, best of 3: 10.6 s per loop
In [5]: %%timeit
...: values = [delayed(subtract)(a10e4, b10e4[index]) for index in range(len(b10e4)) ]
...: resultsDask = compute(*values, get=dask.multiprocessing.get)
1 loop, best of 3: 14.4 s per loop
Upvotes: 5
Views: 2653
Reputation: 57251
Two issues:
Upvotes: 6