Reputation: 1396
Is there anyway to prevent dask/distributed from cancelling queued & executing futures when the client is closed?
I want to use a jupyter notebook to kick off some very long running simulations with distributed, close the notebook, and sometime later, retrieve the results.
Upvotes: 1
Views: 542
Reputation: 28683
You can use the "publish" mechanism to keep references to some data around in the scheduler for later retrieval in another client. Two forms exist which do the same thing with different syntax:
>>> client.publish_dataset(mydata=f)
Here f
is a future, list of futures or a dask collection (dataframe, etc).
In another session:
>>> client.list_datasets()
['mydata']
>>> client.get_dataset('mydata')
<same thing as f>
The alternative and maybe simpler syntax looks like
>>> client.datasets['mydata'] = f
>>> list(client.datasets)
['mydata']
>>> client.datasets['mydata']
<same thing as f>
To remove the references and allow the data to be cleared from the cluster (if no client needs them), use client.unpublish_dataset('mydata')
or del client.datasets['mydata']
.
Upvotes: 2