Reputation: 323
I have a 3d array (10x10x3) which, for some reason, is saved as a 2d xr.DataArray (100x3). It looks a bit like this:
data = xr.DataArray(np.random.randn(100, 3),
dims=('ct', 'x'),
coords={'ct': range(100)})
c = [x%10 for x in range(100)]
t = [1234+x//10 for x in range(100)]
c and t are the coordinates that are bundled together in ct.
In the past I have solved the issue of separating the two dimension as follows:
t_x_c,x = data.shape
nc = 10
data = np.reshape(data.values,(t_x_c//nc,nc, x))
But this requires a number of assumptions in the data structure that may not be true in the near future (e.g. c and t may not be as regular as in my example).
I have managed to assign c and t as additional coordinates to the array:
data2 = data.assign_coords(
coords={"c": ("ct", c),
"t": ("ct", t),
},)
but I would like to promote them to dimensions of the array. How would I do that?
Upvotes: 9
Views: 6567
Reputation: 8823
One alternative is generating c
and t
coordinates with shape 100
as you started to do and create a MultiIndex from here, however, this should not be necessary. Providing only the desired coordinate values for c
and t
(thus lengths 10 and 10 respectively in this case) should be enough. This answer will provide two already available alternatives in other SO answers and GitHub issues. The relevant code is included in the answer but for details on the implementations the original source should be consulted.
The answer in this other question gives an example of reshaping using pure xarray methods with the following code:
reshaped_ds = ds.assign_coords(
c=np.arange(10), t=np.arange(1234, 1244)
).stack(
aux_dim=("c", "t")
).reset_index(
"ct", drop=True
).rename(
ct="aux_dim"
).unstack("aux_dim")
Note that this only works with datasets and would therefore require ds = data.to_dataset(name="aux_name")
. After reshaping it can be converted to DataArray again with ds.aux_name
.
Another alternative is to generate the multiindex with pandas instead of having xarray create it with assign_coords
+stack
, as shown in this github issue. This alternative is tailored to DataArrays and it even integrates the transposing to make sure the reshaped dimensions preserve the original order. For completeness, here is the code proposed in said issue to reshape DataArrays:
def xr_reshape(A, dim, newdims, coords):
""" Reshape DataArray A to convert its dimension dim into sub-dimensions given by
newdims and the corresponding coords.
Example: Ar = xr_reshape(A, 'time', ['year', 'month'], [(2017, 2018), np.arange(12)]) """
# Create a pandas MultiIndex from these labels
ind = pd.MultiIndex.from_product(coords, names=newdims)
# Replace the time index in the DataArray by this new index,
A1 = A.copy()
A1.coords[dim] = ind
# Convert multiindex to individual dims using DataArray.unstack().
# This changes dimension order! The new dimensions are at the end.
A1 = A1.unstack(dim)
# Permute to restore dimensions
i = A.dims.index(dim)
dims = list(A1.dims)
for d in newdims[::-1]:
dims.insert(i, d)
for d in newdims:
_ = dims.pop(-1)
return A1.transpose(*dims)
Upvotes: 3
Reputation: 3552
You want to use a combination of .set_index()
and .unstack()
methods.
Let's break it up.
First, I create the dummy array at the stage where "c" and "t" are already coordinates:
c, t = [arr.flatten() for arr in np.meshgrid(range(10), range(1234, 1234+10))]
da = xr.DataArray(
np.random.randn(100, 3),
dims=('ct', 'x'),
coords={
'c': ('ct', c),
't': ('ct', t)
}
)
Then, use set_index()
to create a MultiIndex
combining "c" and "t" coordinates:
>>> da.set_index(ct=("c", "t"))
<xarray.DataArray (ct: 100, x: 3)>
[...]
Coordinates:
* ct (ct) MultiIndex
- c (ct) int64 0 1 2 3 4 5 6 7 8 9 0 1 2 ...
- t (ct) int64 1234 1234 1234 1234 1234 ...
Dimensions without coordinates: x
Then, use unstack()
to make the "c" and "t" levels of the "ct" MultiIndex be dimensions:
>>> da.set_index(ct=("c", "t")).unstack("ct")
<xarray.DataArray (x: 3, c: 10, t: 10)>
Coordinates:
* c (c) int64 0 1 2 3 4 5 6 7 8 9
* t (t) int64 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243
Dimensions without coordinates: x
>>> da.set_index(ct=("c", "t")).unstack("ct").dims
('x', 'c', 't')
However, as you can see, .unstack()
is putting unstacked dimensions last. So you may eventually want to transpose:
>>> da.set_index(ct=("c", "t")).unstack("ct").transpose("c", "t", "x").dims
('c', 't', 'x')
Upvotes: 10