Reputation: 16987
Say I have an xarray.Dataset
object loaded in using xarray.open_dataset(..., decode_times=False)
that looks like this when printed:
<xarray.Dataset>
Dimensions: (bnds: 2, lat: 15, lon: 34, plev: 8, time: 3650)
Coordinates:
* time (time) float64 3.322e+04 3.322e+04 3.322e+04 3.322e+04 ...
* plev (plev) float64 1e+05 8.5e+04 7e+04 5e+04 2.5e+04 1e+04 5e+03 ...
* lat (lat) float64 40.46 43.25 46.04 48.84 51.63 54.42 57.21 60.0 ...
* lon (lon) float64 216.6 219.4 222.2 225.0 227.8 230.6 233.4 236.2 ...
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) float64 3.322e+04 3.322e+04 3.322e+04 3.322e+04 ...
lat_bnds (lat, bnds) float64 39.07 41.86 41.86 44.65 44.65 47.44 47.44 ...
lon_bnds (lon, bnds) float64 215.2 218.0 218.0 220.8 220.8 223.6 223.6 ...
hus (time, plev, lat, lon) float64 0.006508 0.007438 0.008751 ...
What would be the best way to subset this given multiple ranges for lat
, lon
, and time
? I've tried chaining a series of conditions and used xarray.Dataset.where
, but I get an error saying:
IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load().
I can't load the entire dataset into memory, so what would be the typical way to do this?
Upvotes: 4
Views: 6980
Reputation: 9603
NetCDF4 doesn't support all of the multi-dimensional indexing operations supported by NumPy. But does support slicing (which is very fast) and one dimensional indexing (somewhat slower).
Some things to try:
.sel(time=slice(start, end))
) before indexing with 1-dimensional arrays. This should offload the array-based indexing from netCDF4 to Dask/NumPy..chunk()
.If that doesn't work, post a full self-contained example to the xarray issue tracker on GitHub and we can take a look into it in more detail.
Upvotes: 3