Reputation: 6427
all. I'm using a Dask Distributed cluster to write Zarr+Dask-backed Xarray Datasets inside of a loop, and the dataset.to_zarr
is blocking. This can really slow things down when there are straggler chunks that block the continuation of the loop. Is there a way to do the .to_zarr
asynchronously, so that the loop can continue with the next dataset write without being held up by a few straggler chunks?
Upvotes: 3
Views: 1059
Reputation: 28683
With the distributed scheduler, you get async behaviour without any special effort. For example, if you are doing arr.to_zarr
, then indeed you are going to wait for completion. However, you could do the following:
client = Client(...)
out = arr.to_zarr(..., compute=False)
fut = client.compute(out)
This will return a future, fut
, whose status reflects the current state of the whole computation, and you can choose whether to wait on it or to continue submitting new work. You could also display it to a progress bar (in the notebook) which will update asynchronously whenever the kernel is not busy.
Upvotes: 5