Reputation: 709
I have an xarray dataset:
As you can see the dimensions are (lat, lon, step (hours), time (days)). I want to merge the hours and days into one so that the dimensions are instead (lat, lon, timestep). How do I do this?
Upvotes: 3
Views: 1640
Reputation: 890
You can use the stack
method to create a multiindex of the the time and step dimensions. As your valid_time
coord already has the correct datetime
dimension, you can also drop the multiindex coords and only keep the valid_time
coord withe actual datetimes.
import numpy as np
import xarray as xr
import pandas as pd
# Create a dummy representation of your data
ds = xr.Dataset(
data_vars={"a": (("x", "y", "time", "step"), np.random.rand(5, 5, 3, 24))},
coords={
"time": pd.date_range(start="1999-12-31", periods=3, freq="d"),
"step": pd.timedelta_range(start="1h", freq="h", periods=24),
},
)
ds = ds.assign_coords(valid_time=ds.time + ds.step)
# Stack the time and step dims
stacked_ds = ds.stack(datetime=("time", "step"))
# Drop the multiindex if you want to keep only the valid_time coord which
# contains the combined date and time information.
# Rename vars and dims to your liking.
stacked_ds = (
stacked_ds.drop_vars("datetime")
.rename_dims({"datetime": "time"})
.rename_vars({"valid_time": "time"})
)
print(stacked_ds)
<xarray.Dataset>
Dimensions: (time: 72, x: 5, y: 5)
Coordinates:
* time (time) datetime64[ns] 1999-12-31T01:00:00 ... 2000-01-03
Dimensions without coordinates: x, y
Data variables:
a (x, y, time) float64 0.1961 0.3733 0.2227 ... 0.4929 0.7459 0.4106
Like this we create a single time dimension with a continuous datetime series as coordinate. However, it is not and index. For some methods, like resample
, time needs to be an index. We can fix that be explicitly setting it an index:
stacked_ds.set_index(time="time")
However, this will make 'time' a variable instead of a coordinate. To make it a coordinate again, we can use
stacked_ds.set_index(time="time").set_coords("time")
You can use stacking of dimensions on Dataarrays as well. However, they do not have rename_dims
and rename_vars
methods. Instead, you can use swap_dims
and rename
:
(
ds.a.stack(datetime=("time", "step"))
.drop_vars("datetime")
.swap_dims({"datetime": "time"})
.rename({"valid_time": "time"})
).set_index(time="time")
Upvotes: 4