mschrimpf
mschrimpf

Reputation: 559

combine complementary DataArrays

I have a list of DataArrays with three dimensions. For each item in the list, two of the dimensions are a single value but the combination of all items would yield the full combinatorial values.

import itertools
import numpy as np
import xarray as xr

ds = []
for vals_dim1, vals_dim2 in itertools.product(list(range(2)), list(range(3))):
    d = xr.DataArray(np.random.rand(1, 1, 4),
                     coords={'dim1': [vals_dim1], 'dim2': [vals_dim2], 'dim3': range(4)},
                     dims=['dim1', 'dim2', 'dim3'])
    ds.append(d)

I then want to combine these complimentary DataArrays but none of what I tried so far seems to work. The result should be a DataArray with shape |2x3x4| and dimensions dim1: |2|, dim2: |3|, dim3: |4|.

The following do not work:

# does not automatically infer dimensions and fails with
# "ValueError: conflicting sizes for dimension 'concat_dim': length 2 on 'concat_dim' and length 6 on <this-array>"
ds = xr.concat(ds, dim=['dim1', 'dim2'])

# will still try to insert a new `concat_dim` and fails with
# "ValueError: conflicting MultiIndex level name(s): 'dim1' (concat_dim), (dim1) 'dim2' (concat_dim), (dim2)"
import pandas as pd
dims = [[0] * 3 + [1] * 3, list(range(3)) * 2]
dims = pd.MultiIndex.from_arrays(dims, names=['dim1', 'dim2'])
ds = xr.concat(ds, dim=dims)

# fails with
# AttributeError: 'DataArray' object has no attribute 'data_vars'
ds = xr.auto_combine(ds)

Upvotes: 0

Views: 836

Answers (1)

shoyer
shoyer

Reputation: 9593

Unfortunately (as you discovered here), you currently cannot concatenate along multiple dimensions at once in xarray.

There are a few ways to work around this. The most performant would be to stack() all objects along a new dimension, and then unstack() after concatenating:

>>> xr.concat([d.stack(z=['dim1', 'dim2']) for d in ds], 'z').unstack('z')
<xarray.DataArray (dim3: 4, dim1: 2, dim2: 3)>
array([[[0.300328, 0.544551, 0.751339],
        [0.612358, 0.937376, 0.67688 ]],

       [[0.065146, 0.85845 , 0.962857],
        [0.102126, 0.395406, 0.245373]],

       [[0.309324, 0.362568, 0.676552],
        [0.709206, 0.719578, 0.960803]],

       [[0.613187, 0.205054, 0.021796],
        [0.434595, 0.779576, 0.937855]]])
Coordinates:
  * dim3     (dim3) int64 0 1 2 3
  * dim1     (dim1) int64 0 1
  * dim2     (dim2) int64 0 1 2

(Here z is a placeholder, really just an arbitrary name for the temporary new dimension.)

Another option would be to make use of merge(). Merge is a little awkward to use with DataArray objects (we should fix that), but this would achieve the same result:

>>> xr.merge([x.rename('z') for x in ds])['z'].rename(None)
<xarray.DataArray (dim1: 2, dim2: 3, dim3: 4)>
array([[[0.300328, 0.065146, 0.309324, 0.613187],
        [0.544551, 0.85845 , 0.362568, 0.205054],
        [0.751339, 0.962857, 0.676552, 0.021796]],

       [[0.612358, 0.102126, 0.709206, 0.434595],
        [0.937376, 0.395406, 0.719578, 0.779576],
        [0.67688 , 0.245373, 0.960803, 0.937855]]])
Coordinates:
  * dim1     (dim1) int64 0 1
  * dim2     (dim2) int64 0 1 2
  * dim3     (dim3) int64 0 1 2 3

(z here is also a placeholder name.)

Note that merge uses a different algorithm from concat, which allocates full output arrays for each argument. So it will be much slower for large arrays.

Upvotes: 1

Related Questions