user308827
user308827

Reputation: 21961

converting pandas dataframe to xarray dataset

   Unnamed: 0       index    datetime  ...   cVI     Reg       average_temp
0           0  2000-01-01  2000-01-01  ...   NaN  Central           -5.883996
1           1  2000-01-02  2000-01-02  ...   NaN  Central           -6.715087
2           2  2000-01-03  2000-01-03  ...   NaN  Central           -6.074254
3           3  2000-01-04  2000-01-04  ...   NaN  Central           -4.222387
4           4  2000-01-05  2000-01-05  ...   NaN  Central           -0.994825

I want to convert the dataframe above to an xarray dataset, with datetime as the index. I do this:

ds = xr.Dataset.from_dataframe(df)

but I am not able to get the datetime column as index. How do I do that?

Upvotes: 2

Views: 7471

Answers (2)

tdy
tdy

Reputation: 41327

First use pd.to_datetime and df.set_index on the desired column:

df['datetime'] = pd.to_datetime(df['datetime'])
df = df.set_index('datetime')

#             cVI      Reg  average_temp
# datetime                              
# 2000-01-01  NaN  Central     -5.883996
# 2000-01-02  NaN  Central     -6.715087
# 2000-01-03  NaN  Central     -6.074254
# 2000-01-04  NaN  Central     -4.222387
# 2000-01-05  NaN  Central     -0.994825

Then your xr.Dataset.from_dataframe code will work as expected:

ds = xr.Dataset.from_dataframe(df)

# <xarray.Dataset>
# Dimensions:       (datetime: 5)
# Coordinates:
#   * datetime      (datetime) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-05
# Data variables:
#     cVI           (datetime) float64 nan nan nan nan nan
#     Reg           (datetime) object 'Central' 'Central' ... 'Central' 'Central'
#     average_temp  (datetime) float64 -5.884 -6.715 -6.074 -4.222 -0.9948

Or as Michael said, convert it from the pandas side using df.to_xarray:

ds = df.to_xarray()

# <xarray.Dataset>
# Dimensions:       (datetime: 5)
# Coordinates:
#   * datetime      (datetime) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-05
# Data variables:
#     cVI           (datetime) float64 nan nan nan nan nan
#     Reg           (datetime) object 'Central' 'Central' ... 'Central' 'Central'
#     average_temp  (datetime) float64 -5.884 -6.715 -6.074 -4.222 -0.9948

Upvotes: 1

Michael Delgado
Michael Delgado

Reputation: 15432

xarray will treat the index in a dataframe as the dimensions of the resulting dataset. A MultiIndex will be unstacked such that each level will form a new orthogonal dimension in the result.

To convert your data to xarray, first set the datetime as index in pandas, with df.set_index('datetime').

ds = df.set_index('datetime').to_xarray()

Alternatively, you could promote it afterwards, with ds.set_coords('datetime') and then swap the indexing dimension with ds.swap_dims:

ds = df.to_xarray()
ds.set_coords('datetime').swap_dims({'index': 'datetime'})

I'd recommend the first option but the second also works if you already have your data as a Dataset and want to swap the index.

Upvotes: 1

Related Questions