mschrimpf
mschrimpf

Reputation: 559

multi-index DataArray in __init__

When DataArrays are created with multiple coordinates for the same dimension, they do not automatically index their coordinates, i.e. the following does not work:

d = DataArray([0], coords={'coordA': ('dim', [0]), 'coordB': ('dim', [0])}, dims=['dim'])
d.sel(coordA=0)  # ValueError: dimensions or multi-index levels ['coordA'] do not exist

This is because the MultiIndex * dim: [coordA, coordB] is not created.

Is there a way to automatically create the MultiIndex on DataArray creation?

We can create the index after object creation, but this is extremely cumbersome when creating DataArrays in many places.

d = d.set_index(dim=['coordA', 'coordB'], append=True)
d.sel(coordA = 0)  # works

Before xarray 0.13, it was possible to override the DataArray.__init__ method and set the index inplace, but inplace now raises an error.

class DataAssembly(DataArray):
    def __init__(self, *args, **kwargs):
        super(DataAssembly, self).__init__(*args, **kwargs)
        self.set_index(dim=['coordA', 'coordB'], append=True, inplace=True)  # no longer works since 0.13

Upvotes: 0

Views: 110

Answers (1)

Maximilian
Maximilian

Reputation: 8510

I think you can get what you're looking for by passing in a MultiIndex to coords:

In [30]: idx = pd.MultiIndex.from_arrays([[0], [0]], names=['cA', 'cB'])

In [28]: d = xr.DataArray([0], dims=['dim'], coords=dict(dim=idx))

In [29]: d
Out[29]:
<xarray.DataArray (dim: 1)>
array([0])
Coordinates:
  * dim      (dim) MultiIndex
  - cA       (dim) int64 0
  - cB       (dim) int64 0


In [31]: d.sel(cA=0)
Out[31]:
<xarray.DataArray (cB: 1)>
array([0])
Coordinates:
  * cB       (cB) int64 0

The original approach doesn't work because it's not clear whether coordA & coordB should be two parts of a MultiIndex, or non-indexed coordinates.

Does that make sense? Any feedback for what could be better?

Upvotes: 1

Related Questions