postelrich
postelrich

Reputation: 3486

How to find inputs of dask.delayed task?

Given a dask.delayed task, I want to get a list of all the inputs (parents) for that task.

For example,

from dask import delayed

@delayed
def inc(x):
    return x + 1

def inc_list(x):
    return [inc(n) for n in x]

task = delayed(sum)(inc_list([1,2,3]))
task.parents ???

Yields the following graph. How could I get the parents of sum#3 such that it yields a list of [inc#1, inc#2, inc#3]?

enter image description here

Upvotes: 1

Views: 229

Answers (1)

MRocklin
MRocklin

Reputation: 57251

Delayed objects don't store references to their inputs, however you can get these back if you're willing dig into the task graph a bit and reconstruct Delayed objects manually.

In particular you can index into the .dask attribute with the delayed objects' key

>>> task.dask[task.key]
(<function sum>,
 ['inc-9d0913ab-d76a-4eb7-a804-51278882b310',
  'inc-2f0e385e-beef-45e5-b47a-9cf5d02e2c1f',
  'inc-b72ce20f-d0c4-4c50-9a88-74e3ef926dd0'])

This shows the task definition (see Dask's graph specification)

The 'inc-...' values are other keys in the task graph. You can get the dependencies using the dask.core.get_dependencies function

>>> from dask.core import get_dependencies
>>> get_dependencies(task.dask, task.key)
{'inc-2f0e385e-beef-45e5-b47a-9cf5d02e2c1f',
 'inc-9d0913ab-d76a-4eb7-a804-51278882b310',
 'inc-b72ce20f-d0c4-4c50-9a88-74e3ef926dd0'}

And from here you can make new delayed objects if you wish

>>> from dask.delayed import Delayed
>>> parents = [Delayed(key, task.dask) for key in get_dependencies(task.dask, task.key)]
[Delayed('inc-b72ce20f-d0c4-4c50-9a88-74e3ef926dd0'),
 Delayed('inc-2f0e385e-beef-45e5-b47a-9cf5d02e2c1f'),
 Delayed('inc-9d0913ab-d76a-4eb7-a804-51278882b310')]

Upvotes: 1

Related Questions