Shrad
Shrad

Reputation: 81

Regarding xarray apply_ufunc

I am trying to calculate daily Tmax from a 3 hourly global dataset. I can do it using groupby but I would like to figure out how I can reduce the computational time by using dask parallel operations (e.g. using apply_ufunc). If there is a good documentation on ufunc please let me know (the documentation on xarray wasn't detailed enough for me, left me a little confused, as I don't have any prior experience with dask). Thanks!!

Here is what my code looks like:

    TAS = xr.open_dataset(INFILE_template.format(YR, YR), chunks={'time':8})
    DAYMAX  = TAS.groupby('time.dayofyear').max(dim='time')
    DAYMAX.to_netcdf(OUTFILE_template.format(YR, YR))

Dimension of TAS are as below:

    <xarray.Dataset>
    Dimensions:  (lat: 720, lon: 1440, time: 2928)
    Coordinates:
    * lon      (lon) float64 0.125 0.375 0.625 0.875 1.125 1.375 1.625 1.875 ...
    * lat      (lat) float64 -89.88 -89.62 -89.38 -89.12 -88.88 -88.62 -88.38 ...
    * time     (time) datetime64[ns] 2008-01-01 2008-01-01T03:00:00 ...
    Data variables:
    tas      (time, lat, lon) float32 dask.array<shape=(2928, 720, 1440),   

Upvotes: 1

Views: 1448

Answers (1)

shoyer
shoyer

Reputation: 9593

If you can already write your analysis with groupby() and other xarray methods, all of these are already parallelized with dask. apply_ufunc makes it easier to wrap new functionality to support xarray and dask, but all the built-in routines in xarray already uses apply_ufunc or something similar internally to support dask.

As a side note: if you could kindly elaborate on what you found confusing or missing from the xarray docs, we are always looking to improve them!

Upvotes: 2

Related Questions