Flatten/ravel/collapse 3-dimensional xr.DataArray (Xarray) into 2 dimensions along an axis?

Question

I have a dataset where I'm storing replicates for different classes/subtypes (not sure what to call it) and then attributes for each one. Essentially, there are 5 subtype/classes, 4 replicates for each subtype/class, and 100 attributes that are measured.

Is there a method like np.ravel or np.flatten that can merge 2 dimensions using Xarray?

In this, I want to merge dims subtype and replicates so I have a 2D array (or pd.DataFrame with attributes vs. subtype/replicates.

It wouldn't need to have the format "coord_1 | coord_2" or anything. It would be useful if it kept the original coord names. Maybe there's something like groupby that could do this? Groupby always confuses me so if it's something native to xarray that would be awesome.

import xarray as xr
import numpy as np

# Set up xr.DataArray
dims = (5,4,100)
DA_data = xr.DataArray(np.random.random(dims), dims=["subtype","replicates","attributes"])
DA_data.coords["subtype"] = ["subtype_%d"%_ for _ in range(dims[0])]
DA_data.coords["replicates"] = ["rep_%d"%_ for _ in range(dims[1])]
DA_data.coords["attributes"] = ["attr_%d"%_ for _ in range(dims[2])]

# DA_data.coords
# Coordinates:
#   * subtype     (subtype)

shoyer · Accepted Answer

Yes, this is exactly what .stack is for:

In [33]: stacked = DA_data.stack(desired=['subtype', 'replicates'])

In [34]: stacked
Out[34]:

array([[ 0.54020268,  0.14914837,  0.83398895, ...,  0.25986503,
         0.62520466,  0.08617668],
       [ 0.47021735,  0.10627027,  0.66666478, ...,  0.84392176,
         0.64461418,  0.4444864 ],
       [ 0.4065543 ,  0.59817851,  0.65033094, ...,  0.01747058,
         0.94414244,  0.31467342],
       ...,
       [ 0.23724934,  0.61742922,  0.97563316, ...,  0.62966631,
         0.89513904,  0.20139552],
       [ 0.21157447,  0.43868899,  0.77488211, ...,  0.98285015,
         0.24367352,  0.8061804 ],
       [ 0.21518079,  0.234854  ,  0.18294781, ...,  0.64679141,
         0.49678393,  0.32215219]])
Coordinates:
  * attributes  (attributes) |S7 'attr_0' 'attr_1' 'attr_2' 'attr_3' ...
  * desired     (desired) object ('subtype_0', 'rep_0') ...

The resulting stacked coordinate is a pandas.MultiIndex, whose values are given by tuples:

In [35]: stacked['desired'].values
Out[35]:
array([('subtype_0', 'rep_0'), ('subtype_0', 'rep_1'),
       ('subtype_0', 'rep_2'), ('subtype_0', 'rep_3'),
       ('subtype_1', 'rep_0'), ('subtype_1', 'rep_1'),
       ('subtype_1', 'rep_2'), ('subtype_1', 'rep_3'),
       ('subtype_2', 'rep_0'), ('subtype_2', 'rep_1'),
       ('subtype_2', 'rep_2'), ('subtype_2', 'rep_3'),
       ('subtype_3', 'rep_0'), ('subtype_3', 'rep_1'),
       ('subtype_3', 'rep_2'), ('subtype_3', 'rep_3'),
       ('subtype_4', 'rep_0'), ('subtype_4', 'rep_1'),
       ('subtype_4', 'rep_2'), ('subtype_4', 'rep_3')], dtype=object)

Flatten/ravel/collapse 3-dimensional xr.DataArray (Xarray) into 2 dimensions along an axis?

Answers (1)

Related Questions