dl.meteo
dl.meteo

Reputation: 1766

How to achieve numpy indexing with xarray Dataset

I know the x and the y indices of a 2D array (numpy indexing).

Following this documentation, xarray uses e.g. Fortran style of indexing.

So when I pass e.g.

ind_x = [1, 2]
ind_y = [3, 4]

I expect 2 values for the index pairs (1,3) and (2,4), but xarray returns a 2x2 matrix.

Now I want to know how to achieve numpy like indexing with xarray?

Note: I want to avoid loading the whole data into memory. So using .values api is not part of the solution I am looking for.

Upvotes: 0

Views: 391

Answers (2)

dl.meteo
dl.meteo

Reputation: 1766

In order to take the speed into account I have made a test with different methods.

def method_1(file_paths: List[Path], indices) -> List[np.array]:
    data=[]
    for file in file_paths:
        d = Dataset(file, 'r')
        data.append(d.variables['hrv'][indices])
        d.close()
    return data


def method_2(file_paths: List[Path], indices) -> List[np.array]:
    data=[]
    for file in file_paths:
        data.append(xarray.open_dataset(file, engine='h5netcdf').hrv.values[indices])
    return data


def method_3(file_paths: List[Path], indices) -> List[np.array]:
    data=[]
    for file in file_paths:
        data.append(xarray.open_mfdataset([file], engine='h5netcdf').hrv.data.vindex[indices].compute())
    return data
In [1]: len(file_paths)
Out[1]: 4813

The results:

  • method_1 (using netcdf4 library): 101.9s
  • method_2 (using xarray and values API): 591.4s
  • method_3 (using xarray+dask): 688.7s

I guess that xarray+dask takes to much time within .compute step.

Upvotes: 0

Val
Val

Reputation: 7023

You can access the underlying numpy array to index it directly:

import xarray as xr

x = xr.tutorial.load_dataset("air_temperature")

ind_x = [1, 2]
ind_y = [3, 4]

print(x.air.data[0, ind_y, ind_x].shape)
# (2,)

Edit:

Assuming you have your data in a dask-backed xarray and don't want to load all of it into memory, you need to use vindex on the dask array behind the xarray data object:

import xarray as xr

# simple chunk to convert to dask array
x = xr.tutorial.load_dataset("air_temperature").chunk({"time":1})

extract = x.air.data.vindex[0, ind_y, ind_x]

print(extract.shape)
# (2,)

print(extract.compute())
# [267.1, 274.1], dtype=float32)

Upvotes: 1

Related Questions