How can I speed up code when using climate data in Jupyter Notebook?

Question

I am using climate data from the Levante Supercomputer in a JupyterNotebook, but I am having some problems regarding the computational time. Therefore, I was wondering if someone had some knowledge of dask or other parallelisation tools to speed up the computations, since it is taking a considerable amount of time.

Here is an example of a simple operation that I do to get the yearly wind speed means of a certain location:

import intake
import xarray as xr

#data_park is an xarray.DataArray, this takes a lot of time
vals = data_park.groupby('time.year').mean().plot()

I have tried so far to set up a client with 25 workers, 5 threads per worker, and a memory limit of 30GB per worker, but I am unsure whether this is a suitable combination or not.

Thanks in advance! :)

How can I speed up code when using climate data in Jupyter Notebook?

Answers (0)

Related Questions