Alejandro
Alejandro

Reputation: 573

Can dask dashboard be used on SageMaker (Labs 1.2.*)?

I don't have browser access to the lab environment, and the available dask extension for lab didn't work for me so far. I want to be able to see the progress and performance data for my dask projects, no luck for now.

compute() sometimes take hours (it's ok, there is a lot of data), and I feel blind throught the process.

Do you know if there is anything supported by labs 1.2? No luck with progressbar either. The instance is on AWS SageMaker, so I cannot access the web dashboard.

Thanks!

Upvotes: 3

Views: 636

Answers (1)

SultanOrazbayev
SultanOrazbayev

Reputation: 16551

It is possible to save most of the dashboard information in a file with performance_report (see docs):

from dask.distributed import Client, performance_report
import time

client = Client()

def f(x):
    time.sleep(x)
    return x

with performance_report():
    futures = client.map(f, range(5))
    results = client.gather(futures)

Note that this will save the final file only once the computation has stopped (either because it was completed or due to an error/interruption).

If you want something that will save information at specific steps/intervals, the simplest solution would be to split computations into chunks and save the information on each chunk. Note that such hacky-saves will reduce the efficiency of parallelisation, since some of the workers might end up idle during frequent .gather or .compute operations.

Another option is to periodically dump the contents of client.get_task_stream(), which returns a tuple containing dictionaries for each task on the scheduler. This can be done either after completion of each future with as_completed or with some periodicity (e.g. every n seconds within a while loop). I can't think of a general solution, but it should be possible.

You might also find the links in this answer useful.

Upvotes: 1

Related Questions