suvayu
suvayu

Reputation: 4664

How can I combine sequential as well as parallel execution of delayed function calls?

I am stuck in a strange place. I have a bunch of delayed function calls that I want to execute in a certain order. While executing in parallel is trivial:

res = client.compute([myfuncs])
res = client.gather(res)

I can't seem to find a way to execute them in sequence, in a non-blocking way.

Here's a minimal example:

import numpy as np
from time import sleep
from datetime import datetime

from dask import delayed
from dask.distributed import LocalCluster, Client


@delayed
def dosomething(name):
    res = {"name": name, "beg": datetime.now()}
    sleep(np.random.randint(10))
    res.update(rand=np.random.rand())
    res.update(end=datetime.now())
    return res


seq1 = [dosomething(name) for name in ["foo", "bar", "baz"]]
par1 = dosomething("whaat")
par2 = dosomething("ahem")
pipeline = [seq1, par1, par2]

Given the above example, I would like to run seq1, par1, and par2 in parallel, but the constituents of seq1: "foo", "bar", and "baz", in sequence.

Upvotes: 0

Views: 232

Answers (1)

mdurant
mdurant

Reputation: 28684

You could definitely cheat and add an optional dependency to your function as follows:

@dask.delayed
def dosomething(name, *args):
     ...

So that you can make tasks depend on one-another, even thought you don't use one result in the next run of the function:

inputs = ["foo", "bar", "baz"]
seq1 = [dosomething(inputs[0])]
for bit in inputs[1:]:
    seq1.append(dosomething(bit, seq1[-1]))

Alternatively, you can read about the distributed scheduler's "futures" interface, whereby you can monitor the progress of tasks in real time.

Upvotes: 1

Related Questions