Anack
Anack

Reputation: 21

change dimensions from point coordinate to lat lon in xarray Dataset

I have this dataset with as possible coordinates stations, lat, lon and time. Right now the dataset uses (stations, time) as dimensions but I would like it to use (lat, lon, time).

I looked online and found how to swap dimensions but I could only find it applied to swapping one dimension.

Any suggestions on how to do this?

<xarray.Dataset>
Dimensions:     (stations: 11, time: 7320)
Coordinates:
  * stations    (stations) int64 11425 11426 11427 11428 ... 11433 11434 11435
    lat         (stations) float64 39.54 39.36 39.24 39.07 ... 38.07 37.9 37.81
    lon         (stations) float64 -74.25 -74.4 -74.6 ... -75.19 -75.34 -75.51
  * time        (time) datetime64[ns] 2010-02-01 ... 2010-02-06T01:59:00
Data variables:
    waterlevel  (time, stations) float64 0.0002405 0.0002313 ... -0.01266

Upvotes: 2

Views: 864

Answers (1)

astoeriko
astoeriko

Reputation: 900

You can make the station coordinate a MultiIndex of the lat and lon coordinates using set_index (as explained here). In a second step you can then unstack the MultiIndex to make lat and lon the dataset dimensions. Note, however, that this will blow up the size of your dataset (unless the station are already on a regular grid), filling up grid points without a station with NaN values. For many applications, making the station dimension a MultiIndex of lat and lon should be enough.

import numpy as np
import pandas as pd
import xarray as xr

ds = xr.Dataset(
    data_vars={"waterlevels": (("station", "time"), np.random.rand(5, 20))},
    coords={
        "station": ("station", ["a", "b", "c", "d", "e"]),
        "lon": ("station", np.random.rand(5)),
        "lat": ("station", np.random.rand(5)),
        "time": pd.date_range(start="10-05-2021", periods=20, freq="d"),
    },
)

# Rename the station coordinate so that you don't overwrite it
ds = ds.rename_vars({"station": "station_id"})
# Create MultiIndex coordinate
ds_multiindex = ds.set_index(
    station=["lat", "lon"]
)
# Unstack the MultiIndex
ds_multiindex.unstack()

Upvotes: 1

Related Questions