Reputation: 163
Is there a way to grid a "row format" dataset into an xarray Dataset without using a loop?
Specifically, I would like an array where all values in the (lat, lon, time) grid are 0 if no value is specified. I am aware of the .to_xarray()
method in pandas (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_xarray.html), but this will not generate full coverage of the desired coordinates (lat, lon, time). Toy example below:
import pandas as pd
import xarray as xr
# row data to be gridded
data = {'lats':[0,0,2], 'lons':[1,2,0], 'times':[0,1,2], 'values':[20,50,30]}
df_rows = pd.DataFrame(data)
# desired coordinates to grid onto:
lat = [0,1,2]
lon = [0,1,2]
time= [0,1,2]
# general form of the desired output Dataset
df_grid = xr.Dataset(data_vars={'data':(('lon','lat','time'), df_rows)},
coords={'lat': lat,
'lon': lon,
'time':time})
Upvotes: 0
Views: 1230
Reputation: 609
The method xr.Dataset.from_dataframe
would do :
import pandas as pd
import xarray as xr
# row data to be gridded
data = {"lat": [0, 0, 2], "lon": [1, 2, 0], "time": [0, 1, 2], "values": [20, 50, 30]}
df_rows = pd.DataFrame(data).set_index(["time", "lon", "lat"])
ds = xr.Dataset.from_dataframe(df_rows)
ds
is a xarray.Dataset
, with three dimensions/coordinates, and one variable values
which is a 3x3x2 block of data.
Setting time
, lon
and lat
as index in df_rows
is essential, as these columns will be understood as coordinates.
Note that this method will fill your variable values
with nan
for the coordinates tuples where values
is not specified.
Upvotes: 1