Ben
Ben

Reputation: 163

Gridding Pandas DataFrame to Multi-Dimensional Xarray Dataset?

Is there a way to grid a "row format" dataset into an xarray Dataset without using a loop?

Specifically, I would like an array where all values in the (lat, lon, time) grid are 0 if no value is specified. I am aware of the .to_xarray() method in pandas (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_xarray.html), but this will not generate full coverage of the desired coordinates (lat, lon, time). Toy example below:

import pandas as pd
import xarray as xr 

# row data to be gridded
data = {'lats':[0,0,2], 'lons':[1,2,0], 'times':[0,1,2], 'values':[20,50,30]}
df_rows = pd.DataFrame(data)
# desired coordinates to grid onto:
lat = [0,1,2]
lon = [0,1,2]
time= [0,1,2]
# general form of the desired output Dataset
df_grid = xr.Dataset(data_vars={'data':(('lon','lat','time'), df_rows)},
                     coords={'lat': lat,
                             'lon': lon,
                             'time':time})

Upvotes: 0

Views: 1230

Answers (1)

cyril
cyril

Reputation: 609

The method xr.Dataset.from_dataframe would do :

import pandas as pd
import xarray as xr

# row data to be gridded
data = {"lat": [0, 0, 2], "lon": [1, 2, 0], "time": [0, 1, 2], "values": [20, 50, 30]}
df_rows = pd.DataFrame(data).set_index(["time", "lon", "lat"])

ds = xr.Dataset.from_dataframe(df_rows)

ds is a xarray.Dataset, with three dimensions/coordinates, and one variable values which is a 3x3x2 block of data.

Setting time, lon and lat as index in df_rows is essential, as these columns will be understood as coordinates.

Note that this method will fill your variable values with nan for the coordinates tuples where values is not specified.

Upvotes: 1

Related Questions