HM14
HM14

Reputation: 709

Xarray merge separate day and hour dimensions into one time dimension in python

I have an xarray dataset:

Jupyter cell output of xarray

As you can see the dimensions are (lat, lon, step (hours), time (days)). I want to merge the hours and days into one so that the dimensions are instead (lat, lon, timestep). How do I do this?

Upvotes: 3

Views: 1640

Answers (1)

astoeriko
astoeriko

Reputation: 890

Creating a one-dimensional time dimension and coordinate

You can use the stack method to create a multiindex of the the time and step dimensions. As your valid_time coord already has the correct datetimedimension, you can also drop the multiindex coords and only keep the valid_time coord withe actual datetimes.

import numpy as np
import xarray as xr
import pandas as pd

# Create a dummy representation of your data
ds = xr.Dataset(
    data_vars={"a": (("x", "y", "time", "step"), np.random.rand(5, 5, 3, 24))},
    coords={
        "time": pd.date_range(start="1999-12-31", periods=3, freq="d"),
        "step": pd.timedelta_range(start="1h", freq="h", periods=24),
    },
)
ds = ds.assign_coords(valid_time=ds.time + ds.step)

# Stack the time and step dims
stacked_ds = ds.stack(datetime=("time", "step"))

# Drop the multiindex if you want to keep only the valid_time coord which
# contains the combined date and time information.
# Rename vars and dims to your liking.
stacked_ds = (
    stacked_ds.drop_vars("datetime")
    .rename_dims({"datetime": "time"})
    .rename_vars({"valid_time": "time"})
)
print(stacked_ds)
<xarray.Dataset>
Dimensions:  (time: 72, x: 5, y: 5)
Coordinates:
  * time     (time) datetime64[ns] 1999-12-31T01:00:00 ... 2000-01-03
Dimensions without coordinates: x, y
Data variables:
    a        (x, y, time) float64 0.1961 0.3733 0.2227 ... 0.4929 0.7459 0.4106

Making the time coordinate an index

Like this we create a single time dimension with a continuous datetime series as coordinate. However, it is not and index. For some methods, like resample, time needs to be an index. We can fix that be explicitly setting it an index:

stacked_ds.set_index(time="time")

However, this will make 'time' a variable instead of a coordinate. To make it a coordinate again, we can use

stacked_ds.set_index(time="time").set_coords("time")

Working with Dataarrays

You can use stacking of dimensions on Dataarrays as well. However, they do not have rename_dims and rename_vars methods. Instead, you can use swap_dims and rename:

(
    ds.a.stack(datetime=("time", "step"))
    .drop_vars("datetime")
    .swap_dims({"datetime": "time"})
    .rename({"valid_time": "time"})
).set_index(time="time")

Upvotes: 4

Related Questions