Reputation: 573
I don't have browser access to the lab environment, and the available dask extension for lab didn't work for me so far. I want to be able to see the progress and performance data for my dask projects, no luck for now.
compute() sometimes take hours (it's ok, there is a lot of data), and I feel blind throught the process.
Do you know if there is anything supported by labs 1.2? No luck with progressbar either. The instance is on AWS SageMaker, so I cannot access the web dashboard.
Thanks!
Upvotes: 3
Views: 636
Reputation: 16551
It is possible to save most of the dashboard information in a file with performance_report
(see docs):
from dask.distributed import Client, performance_report
import time
client = Client()
def f(x):
time.sleep(x)
return x
with performance_report():
futures = client.map(f, range(5))
results = client.gather(futures)
Note that this will save the final file only once the computation has stopped (either because it was completed or due to an error/interruption).
If you want something that will save information at specific steps/intervals, the simplest solution would be to split computations into chunks and save the information on each chunk. Note that such hacky-saves will reduce the efficiency of parallelisation, since some of the workers might end up idle during frequent .gather
or .compute
operations.
Another option is to periodically dump the contents of client.get_task_stream()
, which returns a tuple containing dictionaries for each task on the scheduler. This can be done either after completion of each future with as_completed
or with some periodicity (e.g. every n
seconds within a while
loop). I can't think of a general solution, but it should be possible.
You might also find the links in this answer useful.
Upvotes: 1