Reputation: 6035
I have 37 years of NetCDF files with a daily time step and computing a function for each cell over all years (13513 days). The computation of this function is repeated for all cells. For this, I am using xarray
and using da.sel
approach but it is very slow and not making use of multiple cores of my laptop. I am struggling to figure out how to use dask in the current scenario. Any suggestions to improve/speed-up the code?
for c in range(len(df)):
arr = np.array([])
lon=df.X[c]
lat=df.Y[c]
for yr in range(1979,2016,1):
ds = xr.open_dataset('D:/pr_'+str(yr)+'.nc')
da = ds.var.sel(lon=lon, lat=lat, method='nearest')
arr = np.concatenate([arr, da])
fun = function(arr)
Upvotes: 1
Views: 720
Reputation: 57251
It seems like you're looking for xarray.open_mfdataset
ds = xr.open_dataset('D:/pr_*.nc')
Your code is particularly slow because you repeatedly call np.concatenate
. Every time you call this function you have to copy all of the data that you've loaded so far. This is quadratic in costs.
Upvotes: 1