Miguel
Miguel

Reputation: 433

Python xarray - vectorized indexing

I'm trying to understand vectorized indexing in xarray by following this example from the docs:

import xarray as xr
import numpy as np
da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'],
                  coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})

ind_x = xr.DataArray([0, 1], dims=['x'])
ind_y = xr.DataArray([0, 1], dims=['y'])

The output of the array da is as follows:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

So far so good. Now in the example there are shown two ways of indexing. Orthogonal (not interested in this case) and vectorized (what I want). For the vectorized indexing the following is shown:

In [37]: da[ind_x, ind_x]  # vectorized indexing
Out[37]: 
<xarray.DataArray (x: 2)>
array([0, 5])
Coordinates:
    y        (x) <U1 'a' 'b'
  * x        (x) int64 0 1

The result seems to be what I want, but this feels very strange to me. ind_x (which in theory refers to dims=['x']) is being passed twice but somehow is capable of indexing what appears to be both in the x and y dims. As far as I understand the x dim would be the rows and y dim would be the columns, is that correct? How come the same ind_x is capable of accessing both the rows and the cols?

This seems to be the concept I need for my problem, but can't understand how it works or how to extend it to more dimensions. I was expecting this result to be given by da[ind_x, ind_y] however that seems to yield the orthogonal indexing surprisingly enough.

Upvotes: 1

Views: 1151

Answers (1)

Huite Bootsma
Huite Bootsma

Reputation: 481

Having the example with ind_x being used twice is probably a little confusing: actually, the dimension of the indexer doesn't have to matter at all for the indexing behavior! Observe:

ind_a = xr.DataArray([0, 1], dims=["a"]
da[ind_a, ind_a]

Gives:

<xarray.DataArray (a: 2)>
array([0, 5])
Coordinates:
    x        (a) int32 0 1
    y        (a) <U1 'a' 'b'
Dimensions without coordinates: a

The same goes for the orthogonal example:

ind_a = xr.DataArray([0, 1], dims=["a"])
ind_b = xr.DataArray([0, 1], dims=["b"])
da[ind_a, ind_b]

Result:

<xarray.DataArray (a: 2, b: 2)>
array([[0, 2],
       [4, 6]])
Coordinates:
    x        (a) int32 0 1
    y        (b) <U1 'a' 'c'
Dimensions without coordinates: a, b

The difference is purely in terms of "labeling", as in this case you end up with dimensions without coordinates.

Fancy indexing

Generally stated, I personally do not find "fancy indexing" the most intuitive concept. I did find this example in NEP 21 pretty clarifying: https://numpy.org/neps/nep-0021-advanced-indexing.html

Specifically, this:

Consider indexing a 2D array by two 1D integer arrays, e.g., x[[0, 1], [0, 1]]:

  • Outer indexing is equivalent to combining multiple integer indices with itertools.product(). The result in this case is another 2D array with all combinations of indexed elements, e.g., np.array([[x[0, 0], x[0, 1]], [x[1, 0], x[1, 1]]])

  • Vectorized indexing is equivalent to combining multiple integer indices with zip(). The result in this case is a 1D array containing the diagonal elements, e.g., np.array([x[0, 0], x[1, 1]]).

Back to xarray

da[ind_x, ind_y]

Can also be written as:

da.isel(x=ind_x, y=ind_y)

The dimensions are implicit in the order. However, xarray still attempts to broadcast (based on dimension labels), so da[ind_y] mismatches and results in an error. da[ind_a] and da[ind_b] both work.

More dimensions

The dims you provide for the indexer are what determines the shape of the output, not the dimensions of the array you're indexing.

If you want to select single values along the dimensions (so we're zip()-ing through the indexes simultaneously), just make sure that your indexers share the dimension, here for a 3D array:

da = xr.DataArray(
    data=np.arange(3 * 4 * 5).reshape(3, 4, 5),
    coords={
        "x": [1, 2, 3],
        "y": ["a", "b", "c", "d"],
        "z": [1.0, 2.0, 3.0, 4.0, 5.0],
        },
    dims=["x", "y", "z"],
)

ind_along_x = xr.DataArray([0, 1], dims=["new_index"])
ind_along_y = xr.DataArray([0, 2], dims=["new_index"])
ind_along_z = xr.DataArray([0, 3], dims=["new_index"])

da[ind_along_x, ind_along_y, ind_along_z]

Note that the values of the indexers do not have to the same -- that would be a pretty severe limitation, after all.

Result:

<xarray.DataArray (new_index: 2)>
array([ 0, 33])
Coordinates:
    x        (new_index) int32 1 2
    y        (new_index) <U1 'a' 'c'
    z        (new_index) float64 1.0 4.0
Dimensions without coordinates: new_index

Upvotes: 2

Related Questions