JRR
JRR

Reputation: 6152

firing a sequence of parallel tasks

For this dask code:

def inc(x): 
  return x + 1

for x in range(5):
  array[x] = delay(inc)(x)

I want to access all the elements in array by executing the delayed tasks. But I can't call array.compute() since array is not a function. If I do

for x in range(5):
  array[x].compute()

then does each task gets executed in parallel or does a[1] get fired only after a[0] terminates? Is there a better way to write this code?

Upvotes: 1

Views: 83

Answers (2)

Alex Hall
Alex Hall

Reputation: 36013

It's easy to tell if things are executing in parallel if you force them to take a long time. If you run this code:

from time import sleep, time
from dask import delayed

start = time()

def inc(x):
    sleep(1)
    print('[inc(%s): %s]' % (x, time() - start))
    return x + 1

array = [0] * 5
for x in range(5):
    array[x] = delayed(inc)(x)

for x in range(5):
    array[x].compute()

It becomes very obvious that the calls happen in sequence. However if you replace the last loop with this:

delayed(array).compute()

then you can see that they are in parallel. On my machine the output looks like this:

[inc(1): 1.00373506546]
[inc(4): 1.00429320335]
[inc(2): 1.00471806526]
[inc(3): 1.00475406647]
[inc(0): 2.00795912743]

Clearly the first four tasks it executed were in parallel. Presumably the default parallelism is set to the number of cores on the machine, because for CPU intensive tasks it's not generally useful to have more.

Upvotes: 0

MRocklin
MRocklin

Reputation: 57261

You can use the dask.compute function to compute many delayed values at once

from dask import delayed, compute

array = [delayed(inc)(i) for i in range(5)]
result = compute(*array)

Upvotes: 1

Related Questions