Most straightforward/compact way to iteratively create a DataArray with Xarray

Question

I generally code stuff in which I have to create a DataArray by looping through different options (which will later become coordinates). Since I started with Xarray I've been doing this by creating lists of smaller DataArrays and then concatenating them as such:

import numpy as np
import xarray as xr

extradims = range(3)
size = 5

da_total = []
for cj, extradim in enumerate(extradims):
    data = np.random.normal(scale=cj, size=size)
    da = xr.DataArray(data, dims=['sample'], coords=dict(sample=range(size)))
    da_total.append(da)
dafinal = xr.concat(da_total, dim='extradim').assign_coords(extradim=extradims)

However, this seems very cumbersome compared to other things Xarray does. So I was wondering if there's an easier way I'm missing. In particular, I'd like to avoid using "external" tools (like lists and numpy) and do the whole things just using Xarray. The closest I've gotten was to do this:

samples = range(5)
nans = np.full([len(extradims), len(samples)], np.nan)
dafinal2 = xr.DataArray(nans, dims=['extradim', 'sample'],
                        coords=dict(extradim=extradims, sample=samples))
for cj, extradim in enumerate(extradims):
    data = np.random.normal(scale=cj, size=len(samples))
    da = xr.DataArray(data, dims=['sample'], coords=dict(sample=samples))
    dafinal2.loc[dict(extradim=extradim)] = da

This is a bit more compact in the loops but I'd like to avoid having to create a NaNs array beforehand. I also sometimes don't know the sizes of some of the coordinates before starting the loop, so it would be nice to avoid that.

Ideally I would be able to do something like this:

dafinal3 = xr.DataArray(dims=['extradim', 'sample'])
for cj, extradim in enumerate(extradims):
    data = np.random.normal(scale=cj, size=len(samples))
    da = xr.DataArray(data, dims=['sample'], coords=dict(sample=samples))
    dafinal2.loc[dict(extradim=extradim)] = da

But this, of course, doesn't work.

Is there a way to accomplish what I want?

paime · Accepted Answer

Edit: See this answer by xarray main contributor on a similar question. It confirms that the patterns you are using are equally efficient.

coordinates is a required argument. The DataArray will not extend its grid magically. If you really don't know coordinates beforehand, then xr.concat is the function to use.

If you known coordinates beforehand, then you can initialize an empty DataArray, which is almost your last example.

>>> da = xr.DataArray(coords=(range(3), range(4)))                                               
>>> da

array([[nan, nan, nan, nan],
       [nan, nan, nan, nan],
       [nan, nan, nan, nan]])
Coordinates:
  * dim_0    (dim_0) int64 0 1 2
  * dim_1    (dim_1) int64 0 1 2 3

You may want to choose the default values and the names of dimensions:

>>> da = xr.DataArray(None, coords=dict(x=range(3), y=range(4)), dims=("x", "y"))              
>>> da                  

array([[None, None, None, None],
       [None, None, None, None],
       [None, None, None, None]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2 3

There may be cases where a better alternative would be to populate a Dataset and then to call .to_array() on it.

Most straightforward/compact way to iteratively create a DataArray with Xarray

Answers (1)

Related Questions