Reputation: 537
I want to swap two coord dimensions with a multidimensional coord so I can perform a groupby by time.month and subtracting another dataset.
import xarray as xr
ds = xr.Dataset()
# DataArray indexed by 'init_time' and an offset, 'tau'
ds['tst'] = xr.DataArray(
[[0, 1, 2], [3, 4, 5]],
dims=('init_time', 'tau'),
coords={
'init_time': pd.date_range('2017-01-01', periods=2),
'tau': pd.to_timedelta([1, 2, 3], unit='days')})
# multidimensional coordinate 'time'
ds.coords['time'] = ds['init_time'] + ds['tau']
ds.swap_dims({('init_time', 'tau'): 'time'})
ds
kind of like the result of this:
clim = pd.Series([2], index=[1]).rename_axis('month')
df = ds.to_dataframe().reset_index()
df['month'] = df['time'].dt.month
df = (
pd.DataFrame(
df.set_index(['init_time', 'tau', 'time', 'month'])['tst']
- clim))
df
Upvotes: 2
Views: 1601
Reputation: 15452
The issue with this is that swapping dims would result in duplicate values in the index. Ideally, you'd be able to do a groupby on a multi-dimensional coordinate. You can currently do this, but it's not fully featured (for example, you can't do ds.groupby('time.month').mean(dim='time')
). This looks like it may be in the works (see #324, #2525).
Right now, I think you have two options. You could do this in pandas:
df = ds.to_dataframe().reset_index()
monthly_mean = (
df
.groupby([df.other_dims, df.time.dt.month])
.mean()[['tst']]
.to_xarray())
clim = xr.DataArray([2], dims=['month'], coords=[[1]])
anom = monthly_mean.rename({'time': 'month'}) - clim
Alternatively, you could keep it in xarray by stacking the init_time
and tau
:
In [35]: stacked = ds.stack(obs=('init_time', 'tau'))
In [36]: stacked.coords['obs_num'] = ('obs', ), np.arange(len(stacked.obs))
In [37]: stacked.coords['time'] = ('obs', ), stacked.init_time + stacked.tau
In [38]: swapped = stacked.swap_dims({'obs': 'obs_num'})
In [39]: swapped
Out[39]:
<xarray.Dataset>
Dimensions: (obs_num: 150)
Coordinates:
time (obs_num) datetime64[ns] 2017-01-01 2017-01-02 ... 2017-02-03
obs (obs_num) object (Timestamp('2017-01-01 00:00:00', freq='D'), Timedelta('0 days 00:00:00')) ... (Timestamp('2017-01-30 00:00:00', freq='D'), Timedelta('4 days 00:00:00'))
* obs_num (obs_num) int64 0 1 2 3 4 5 6 7 ... 142 143 144 145 146 147 148 149
Data variables:
tst (obs_num) int64 0 1 2 3 4 5 6 7 ... 142 143 144 145 146 147 148 149
In [47]: swapped.groupby(swapped.time.dt.month).mean(dim='obs_num')
Out[47]:
<xarray.Dataset>
Dimensions: (month: 2)
Coordinates:
* month (month) int64 1 2
Data variables:
tst (month) float64 71.56 145.0
Upvotes: 2