Reputation: 73
I'm currently processing some ocean model outputs. At each time step, it has 42*1800*3600 grid points.
I found that the bottelneck in my program is the slicing, and calling xarray_built in method to extract the values. And what's more interesting, same syntax sometimes require a vastly differnt amount of time.
ds = xarray.open_dataset(filename, decode_times=False)
vvel0=ds.VVEL.sel(lat=slice(-60,-20),lon=slice(0,40))/100 #in CCSM output, unit is cm/s convert to m/s
uvel0=ds.UVEL.sel(lat=slice(-60,-20),lon=slice(0,40))/100 ## why the speed is that different? now it's regional!!
temp0=ds.TEMP.sel(lat=slice(-60,-20),lon=slice(0,40)) #de
Take this for example, reading a VVEL and UVEL took ~4sec, while reading in TEMP only needed ~6ms. Without slicing, VVEL and UVEL took ~1sec, and TEMP needed 120 nanosecond.
I always thought that, when I only input part of the full array, I need less memory, and therefore less time. It turned out, that XARRAY loads in the full array and any extra slicing takes more time. But, could somebody please explain why is reading different variables from the same netcdf file takes that different of time?
The program is designed to extract a stepwise section, and calculate the cross-sectional heat transport, so I need to pick out either UVEL or VVEL, times that by TEMP along the section. So, it may seems that, loading in TEMP that fast is good, isn't it?
Unfortunately, that's not the case. When I loop through about ~250 grid points along the prescribed section...
# Calculate VT flux orthogonal to the chosen grid cells, which is the heat transport across GOODHOPE line
vtflux=[]
utflux=[]
vap = vtflux.append
uap = utflux.append
#for i in range(idx_north,idx_south+1):
for i in range(10):
yidx=gh_yidx[i]
xidx=gh_xidx[i]
lon_next=ds_lon[i+1].values
lon_current=ds_lon[i].values
lat_next=ds_lat[i+1].values
lat_current=ds_lat[i].values
tt=np.squeeze(temp[:,yidx,xidx].values) #<< calling values is slow
if (lon_next<lon_current) and (lat_next==lat_current): # The condition is incorrect
dxlon=Re*np.cos(lat_current*np.pi/180.)*0.1*np.pi/180.
vv=np.squeeze(vvel[:,yidx,xidx].values)
vt=vv*tt
vtdxdz=np.dot(vt[~np.isnan(vt)],layerdp[0:len(vt[~np.isnan(vt)])])*dxlon
vap(vtdxdz)
#del vtdxdz
elif (lon_next==lon_current) and (lat_next<lat_current):
#ut=np.array(uvel[:,gh_yidx[i],gh_xidx[i]].squeeze().values*temp[:,gh_yidx[i],gh_xidx[i]].squeeze().values) # slow
uu=np.squeeze(uvel[:,yidx,xidx]).values # slow
ut=uu*tt
utdxdz=np.dot(ut[~np.isnan(ut)],layerdp[0:len(ut[~np.isnan(ut)])])*dxlat
uap(utdxdz) #m/s*degC*m*m ## looks fine, something wrong with the sign
#del utdxdz
total_trans=(np.nansum(vtflux)-np.nansum(utflux))*3996*1026/1e15
Especially this line:
tt=np.squeeze(temp[:,yidx,xidx].values)
It takes ~3.65 Sec, but now it has to be repeated for ~250 times. If I remove .values
, then this time reduces to ~4ms. But I need to time the tt
to vt
, so I have to extract the values. What's weird, is that the similar expression, vv=np.squeeze(vvel[:,yidx,xidx].values)
requires much less time, only about ~1.3ms.
To summarize my questions:
Thank you!
Upvotes: 3
Views: 3441
Reputation: 9613
When you index a variable loaded from a netCDF file, xarray doesn't load it into memory immediately. Instead, we create a lazy array that supports any number of further differed indexing operations. This is true even if you aren't using dask.array (triggered by setting chunks=
in open_dataset
or using open_mfdataset
).
This explains the surprising performance you observe. Calculating temp0
is fast, because it doesn't load any data from disk. vvel0
is slow, because dividing by 100 requires loading the data into memory as a numpy array.
Later, it's slower to index temp0
because each operation loads data from disk, instead of indexing a numpy array already in memory.
The work-around is to explicitly load the portion of your dataset that you need into memory first, e.g., by writing temp0.load()
. The netCDF section of the xarray docs also gives this tip.
Upvotes: 5