Dask Delayed with xarray - compute() result is still delayed

Question

I tried to perform with Dask and xarray some analysis (e.g. avg) over two datasets, then compute a difference between the two results.

This is my code

cluster = LocalCluster(n_workers=5, threads_per_worker=3, **worker_kwargs)

def calc_avg(path):
    
    mean = xr.open_mfdataset( path,combine='nested', concat_dim="time", parallel=True, decode_times=False, decode_cf=False)['var'].sel(lat=slice(south,north), lon=slice(west,east)).mean(dim='time')
    return mean

def diff_(x,y):
    return x-y

p1 = "/path/to/first/multi-file/dataset"
p2 = "/path/to/second/multi-file/dataset"

a = dask.delayed(calc_avg)(p1)  
b = dask.delayed(calc_avg)(p2)
total = dask.delayed(diff_)(a,b)
result = total.compute()

The executiuon time here is 17s.

However, plotting the result (result.plot()) takes more than 1 min, so it seems that the calculation actually happens when trying to plot the result.

Is this the proper way to use Dask delayed?

Dask Delayed with xarray - compute() result is still delayed

Answers (1)

Related Questions