Reputation: 1766
I know the x and the y indices of a 2D array (numpy indexing).
Following this documentation, xarray uses e.g. Fortran style of indexing.
So when I pass e.g.
ind_x = [1, 2]
ind_y = [3, 4]
I expect 2 values for the index pairs (1,3) and (2,4), but xarray returns a 2x2 matrix.
Now I want to know how to achieve numpy like indexing with xarray?
Note: I want to avoid loading the whole data into memory. So using .values
api is not part of the solution I am looking for.
Upvotes: 0
Views: 391
Reputation: 1766
In order to take the speed into account I have made a test with different methods.
def method_1(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
d = Dataset(file, 'r')
data.append(d.variables['hrv'][indices])
d.close()
return data
def method_2(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
data.append(xarray.open_dataset(file, engine='h5netcdf').hrv.values[indices])
return data
def method_3(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
data.append(xarray.open_mfdataset([file], engine='h5netcdf').hrv.data.vindex[indices].compute())
return data
In [1]: len(file_paths)
Out[1]: 4813
The results:
I guess that xarray+dask takes to much time within .compute
step.
Upvotes: 0
Reputation: 7023
You can access the underlying numpy
array to index it directly:
import xarray as xr
x = xr.tutorial.load_dataset("air_temperature")
ind_x = [1, 2]
ind_y = [3, 4]
print(x.air.data[0, ind_y, ind_x].shape)
# (2,)
Edit:
Assuming you have your data in a dask
-backed xarray
and don't want to load all of it into memory, you need to use vindex
on the dask
array behind the xarray
data object:
import xarray as xr
# simple chunk to convert to dask array
x = xr.tutorial.load_dataset("air_temperature").chunk({"time":1})
extract = x.air.data.vindex[0, ind_y, ind_x]
print(extract.shape)
# (2,)
print(extract.compute())
# [267.1, 274.1], dtype=float32)
Upvotes: 1