evilkonrex
evilkonrex

Reputation: 255

Dask worker graceful task failure

When I run dask.distributed workers, any exception thrown in the task function gets propagated to scheduler and kills whole job. Is there a way to gracefully fail the task so that scheduler takes care of retrying it (potentially on another worker)?

Upvotes: 1

Views: 789

Answers (1)

MRocklin
MRocklin

Reputation: 57271

Currently the Dask.distributed scheduler interprets an exception as the true value of the task. Automatic retries are not currently supported (as of August 2017). However this has been frequently requested. I would not be surprised to see this change in the near future.

In the mean time we recommend adding retry logic within your task.

def f(*args, **kwargs):
    for i in range(n_retries):
        try:
            # your code
            # return result
        except Exception:
            pass

future = client.submit(f, *args, **kwargs)

Upvotes: 1

Related Questions