vernigan
vernigan

Reputation: 21

Adding dimensions/coordinates to variables in xarray

I am converting an Excel file to an xarray, and I am having trouble assigning dimensions to my variables.

When converting from a Pandas dataframe to xarray, I end up with something like the following:

<xarray.Dataset>
Dimensions:  (index: 10160)
Coordinates:
  * index    (index) int64 0123... 10156 10157 10158 10159
Data variables:
    DATE      (index) datetime64[ns] 2003-08-21 2003-08-21 ... 2021-08-14
    TIME      (index) object  2315 2315 316 ... 1816 1949 1949
    LATITUDE  (index) float64 64.07 64.07 64.07 ... 65.64 65.64
    LONGITUDE (index) float64 -164.6-164.6 .... -168.3 -168.3
    salinity  (index) float64 float64 nan nan nan ... 31.83 30.48 30.49
    temp      (index) float64 nan nan nan ... 2.474 9.171 9.092

To change some of the Data Variables into coordinates I use the following code:

ds
 .assign_coords({"index": ds.TIME.values})
 .assign_coords({"date": ds.DATE.values})
 .assign_coords({"longitude": ds.LONGITUDE.values})
 .assign_coords({"latitude": ds.LATITUDE.values})
 .drop("TIME")
 .drop("DATE")
 .drop("LONGITUDE")
 .drop("LATITUDE")
 .rename_dims({"index":"time"})      
 .rename({"index":"time"})           
)

This solves some of the issue, by adding time, date, lat, and Lon as both dimensions and coordinates, and changing the arbitrary index to time:

<xarray.Dataset>
Dimensions:  (time: 10160, date: 10160, longitude: 10160, latitude: 10160)
Coordinates:
   time    (time) object 2315 2315 316 ... 1816 1949 1949
   date    (date) datetime64[ns] 2003-08-21 2003-08-21 ... 2021-08-14
   longitude (longitude) float64 -164.6-164.6 .... -168.3 -168.3
   latitude (latitude) float64 64.07 64.07 64.07 ... 65.64 65.64
Data variables:
    salinity  (time) float64 float64 nan nan nan ... 31.83 30.48 30.49
    temp      (time) float64 nan nan nan ... 2.474 9.171 9.092

One way I tried to address this was by forming a multi-index with the 4 dimensions that the variables should be associated with:

midx = pd.MultiIndex.from_arrays([ds.LATITUDE.values, ds.LONGITUDE.values, ds.TIME.values, ds.DATE.values], names = ['latitude','longitude','time','date'])
ds['midx'] = midx

ds
 .assign_coords({"index": ds.midx.values})
 .assign_coords({"time": ds.TIME.values})
 .assign_coords({"date": bns_xr.DATE.values})
 .assign_coords({"longitude": bns_xr.LONGITUDE.values})
 .assign_coords({"latitude": bns_xr.LATITUDE.values})
 .drop("TIME")
 .drop("DATE")
 .drop("LONGITUDE")
 .drop("LATITUDE")  
 .drop("midx")
 .rename_dims({"index":"midx"})      
 .rename({"index":"midx"})   

However, this results in variables that are in dimensions of midx rather than having the 4 desired dimensions. How can I connect the dimensions to the variables eg: salinity(latitude, longitude, time, date)?

Upvotes: 2

Views: 1152

Answers (1)

Michael Delgado
Michael Delgado

Reputation: 15432

Once you’ve assigned the multi index as a dimension, you can unstack it with xr.Dataset.unstack:

unstacked = ds.unstack("midx")

However, I expect that date and time are not perpendicular to each other but are instead just information about a datetime dimension which should be present in the array. If your data should only be 3D, then make sure to only include the three dims which index the data in the multiindex.

Alternatively, this could all be achieved by setting the correct indices as a multiindex on the pandas dataframe prior to converting to xarray:

df.set_index(["time", "latitude", "longitude"]).to_xarray()

Upvotes: 1

Related Questions