Ibe
Ibe

Reputation: 6035

xarray dataset selection method is very slow

I have 37 years of NetCDF files with a daily time step and computing a function for each cell over all years (13513 days). The computation of this function is repeated for all cells. For this, I am using xarray and using da.sel approach but it is very slow and not making use of multiple cores of my laptop. I am struggling to figure out how to use dask in the current scenario. Any suggestions to improve/speed-up the code?

for c in range(len(df)):
    arr = np.array([])
    lon=df.X[c]
    lat=df.Y[c]
    for yr in range(1979,2016,1):
        ds = xr.open_dataset('D:/pr_'+str(yr)+'.nc')
        da = ds.var.sel(lon=lon, lat=lat, method='nearest')
        arr = np.concatenate([arr, da])

    fun = function(arr)

Upvotes: 1

Views: 720

Answers (1)

MRocklin
MRocklin

Reputation: 57251

It seems like you're looking for xarray.open_mfdataset

ds = xr.open_dataset('D:/pr_*.nc')

Your code is particularly slow because you repeatedly call np.concatenate. Every time you call this function you have to copy all of the data that you've loaded so far. This is quadratic in costs.

Upvotes: 1

Related Questions