JWB
JWB

Reputation: 198

How to reshape xarray dataset by collapsing coordinate

I currently have a dataset that when opened with xarray contains three coordinates x, y, band. The band coordinate has temperature and dewpoint each at 4 different time intervals, meaning there are 8 total bands. Is there a way to reshape this so that I could have x, y, band, time such that the band coordinate is now only length 2 and the time coordinate would be length 4?

I thought I could add a new coordinate named time and then add the bands in but

ds = ds.assign_coords(time=[1,2,3,4])

returns ValueError: cannot add coordinates with new dimensions to a DataArray.

Upvotes: 0

Views: 709

Answers (1)

Michael Delgado
Michael Delgado

Reputation: 15452

You can re-assign the "band" coordinate to a MultiIndex:

In [4]: da = xr.DataArray(np.random.random((4, 4, 8)), dims=['x', 'y', 'band'])

In [5]: da.coords['band'] = pd.MultiIndex.from_arrays(
   ...:     [
   ...:         [1, 1, 1, 1, 2, 2, 2, 2],
   ...:         pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'] * 2),
   ...:     ],
   ...:     names=['band_stacked', 'time'],
   ...: )

In [6]: stacked
Out[6]:
<xarray.DataArray (x: 4, y: 4, band: 8)>
array([[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01, 5.23808010e-01,
         8.56941412e-01, 2.75757101e-01, 7.88877551e-02, 1.54739786e-02],
        [3.70350510e-01, 1.90604842e-02, 2.17871931e-01, 9.40704074e-01,
         4.28769745e-02, 9.24407375e-01, 2.81715762e-01, 9.12889594e-01],
        [7.36529770e-02, 1.53507827e-01, 2.83341417e-01, 3.00687140e-01,
         7.41822972e-01, 6.82413237e-01, 7.92126231e-01, 4.84821281e-01],
        [5.24897891e-01, 4.69537663e-01, 2.47668326e-01, 7.56147251e-02,
         6.27767921e-01, 2.70630355e-01, 5.44669493e-01, 3.53063860e-01]],
...
       [[1.56513994e-02, 8.49568142e-01, 3.67268562e-01, 7.28406400e-01,
         2.82383223e-01, 5.00901504e-01, 9.99643260e-01, 1.16446139e-01],
        [9.98980637e-01, 2.45060112e-02, 8.12423749e-01, 4.49895624e-01,
         6.64880037e-01, 8.73506549e-01, 1.79186788e-01, 1.94347924e-01],
        [6.32000394e-01, 7.60414128e-01, 4.90153658e-01, 3.40693056e-01,
         5.19820559e-01, 4.49398587e-01, 1.90339730e-01, 6.38101614e-02],
        [7.64102189e-01, 6.79961676e-01, 7.63165470e-01, 6.23766131e-02,
         5.62677420e-01, 3.85784911e-01, 4.43436365e-01, 2.44385584e-01]]])
Coordinates:
  * band          (band) MultiIndex
  - band_stacked  (band) int64 1 1 1 1 2 2 2 2
  - time          (band) datetime64[ns] 2020-01-01 2021-01-01 ... 2023-01-01
Dimensions without coordinates: x, y

Then you can expand the dimensionality by unstacking:

In [7]: unstacked
Out[7]:
<xarray.DataArray (x: 4, y: 4, band: 2, time: 4)>
array([[[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01,
          5.23808010e-01],
         [8.56941412e-01, 2.75757101e-01, 7.88877551e-02,
          1.54739786e-02]],
...
        [[7.64102189e-01, 6.79961676e-01, 7.63165470e-01,
          6.23766131e-02],
         [5.62677420e-01, 3.85784911e-01, 4.43436365e-01,
          2.44385584e-01]]]])
Coordinates:
  * band     (band) int64 1 2
  * time     (time) datetime64[ns] 2020-01-01 2021-01-01 2022-01-01 2023-01-01
Dimensions without coordinates: x, y

Another more manual option would be to reshape in numpy and just create a new DataArray. Note that this manual reshape is much faster for a larger array:

In [8]: reshaped = xr.DataArray(
   ...:     da.data.reshape((4, 4, 2, 4)),
   ...:     dims=['x', 'y', 'band', 'time'],
   ...:     coords={
   ...:         'time': pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']),
   ...:         'band': [1, 2],
   ...:     },
   ...: )

Note that if your data is chunked (and assuming you'd like to keep it that way) your options are a bit more limited - see the dask docs on reshaping dask arrays. The first (MultiIndexing unstack) approach does work with dask arrays as long as the arrays are not chunked along the unstacked dimension. See this question for an example.

Upvotes: 1

Related Questions