Reputation: 6152
For this dask code:
def inc(x):
return x + 1
for x in range(5):
array[x] = delay(inc)(x)
I want to access all the elements in array
by executing the delayed tasks. But I can't call array.compute()
since array
is not a function. If I do
for x in range(5):
array[x].compute()
then does each task gets executed in parallel or does a[1]
get fired only after a[0]
terminates? Is there a better way to write this code?
Upvotes: 1
Views: 83
Reputation: 36013
It's easy to tell if things are executing in parallel if you force them to take a long time. If you run this code:
from time import sleep, time
from dask import delayed
start = time()
def inc(x):
sleep(1)
print('[inc(%s): %s]' % (x, time() - start))
return x + 1
array = [0] * 5
for x in range(5):
array[x] = delayed(inc)(x)
for x in range(5):
array[x].compute()
It becomes very obvious that the calls happen in sequence. However if you replace the last loop with this:
delayed(array).compute()
then you can see that they are in parallel. On my machine the output looks like this:
[inc(1): 1.00373506546]
[inc(4): 1.00429320335]
[inc(2): 1.00471806526]
[inc(3): 1.00475406647]
[inc(0): 2.00795912743]
Clearly the first four tasks it executed were in parallel. Presumably the default parallelism is set to the number of cores on the machine, because for CPU intensive tasks it's not generally useful to have more.
Upvotes: 0
Reputation: 57261
You can use the dask.compute
function to compute many delayed values at once
from dask import delayed, compute
array = [delayed(inc)(i) for i in range(5)]
result = compute(*array)
Upvotes: 1