When is xarrays `xr.apply_ufunc(...dask='parallelized')` fast?

Question

I open data from the ERA5 Google Cloud Zarr archive. I do some refactoring (change time resolution, select Northern Hemisphere only, etc.), where the operations are applied on dask data.

This is how the xarray DataArray looks like:

Then I apply a numpy function that works on multi-dimensional data using xr.apply_ufunc:

I tested two scenarios, one where the data is loaded first and ufunc is applied on numpy and one where ufunc is applied on dask data and the result computed afterwards.

option 1: compute data first)

This is fast.

option 2: compute result later)

This is slow.

Question: Why is option 1 so much faster?

I'd think that dask=parallelized distributes the work to be done across multiple cores. Is numpy doing that as well? Why is applying ufunc to numpy still so much faster?

As soon as the dataset becomes large, it will be unfeasible to load the data into memory first (option 1). Therefore, I want to make sure that I'm not doing something stupid makes dask=parallelized very slow.

Thanks.

JonasV · Accepted Answer

If the array is small enough to comfortably fit into memory the overhead of spinning up dask workers, scheduling those workers and just the general I/O of splitting data, computing it and stitching it back together makes dask a lot slower.

Dask helps scaling up. It isn't good for speeding up array computations that fit completely into your memory.

If what you are doing is stupid really depends on what the function _to_lon_lat is doing. But in general, if what you are doing in that function can be done with native xarray functions, then you shouldn't use apply_ufunc but instead those xarray functions since all of them work with dask out of the box.

When is xarrays `xr.apply_ufunc(...dask='parallelized')` fast?

option 1: compute data first)

option 2: compute result later)

Question: Why is option 1 so much faster?

Answers (1)

Related Questions

When is xarrays `xr.apply_ufunc(...dask=&#39;parallelized&#39;)` fast?

option 1: compute data first)

option 2: compute result later)

Question: Why is option 1 so much faster?

Answers (1)

Related Questions

When is xarrays `xr.apply_ufunc(...dask='parallelized')` fast?