Reputation: 1569
I would like to see a progress bar on Jupyter notebook while I'm running a compute task using Dask, I'm counting all values of id
column from a large csv file +4GB, so any ideas?
import dask.dataframe as dd
df = dd.read_csv('data/train.csv')
df.id.count().compute()
Upvotes: 36
Views: 29573
Reputation: 678
Bellow will show remaining time and items
from tqdm.dask import TqdmCallback
with TqdmCallback(desc="compute"):
...
arr.compute()
# or use callback globally
cb = TqdmCallback(desc="global")
cb.register()
arr.compute()
https://github.com/tqdm/tqdm#dask-integration
Upvotes: 1
Reputation: 382
This resource provides full-code examples for both cases (local and distributed) and more detailed information about using the Dask Dashboard.
Note that when working in Jupyter notebooks you may have to separate the ProgressBar().register()
call and the computation call you want to track (e.g. df.set_index('id').persist()
) into two separate cells for the progress bar to actually appear.
DO:
DON'T DO:
Upvotes: 0
Reputation: 57281
If you're using the single machine scheduler then do this:
from dask.diagnostics import ProgressBar
ProgressBar().register()
http://dask.pydata.org/en/latest/diagnostics-local.html
If you're using the distributed scheduler then do this:
from dask.distributed import progress
result = df.id.count.persist()
progress(result)
Or just use the dashboard
http://dask.pydata.org/en/latest/diagnostics-distributed.html
Upvotes: 48