Reputation: 1600
I am a Pandas user migrating to Xarray because I work with geospatial 3D data. Some stuff I only know how to do using Pandas and many times doesn't make any sense to convert to a Pandas DataFrame and then reconvert it to Xarray Dataset object.
What I am trying to do is to replace the current dimension of a Xarray object
with two new ones, and those two new ones are currently data variables in the Xarray object
.
We start from the point that the data
is a Xarray object
just like:
<xarray.Dataset>
Dimensions: (index: 9)
Coordinates:
* index (index) int64 0 1 2 3 4 5 6 7 8
Data variables:
Letter (index) object 'A' 'A' 'A' 'B' 'B' 'B' 'C' 'C' 'C'
Number (index) int64 1 2 3 1 2 3 1 2 3
Value1 (index) float64 0.5453 1.184 -1.177 0.8232 ... -1.253 0.3274 -1.583
Value2 (index) float64 -0.4184 -0.3325 0.6826 ... -0.264 0.07381 0.4357
What I am trying to do is to reshape and reindexing the variables Value1
and Value2
to assign Letter
and Number
as its dimensions.
The way I am used to doing is:
reindexed = data.to_dataframe().set_index(['Letter','Number']).to_xarray()
That returns:
<xarray.Dataset>
Dimensions: (Letter: 3, Number: 3)
Coordinates:
* Letter (Letter) object 'A' 'B' 'C'
* Number (Number) int64 1 2 3
Data variables:
Value1 (Letter, Number) float64 0.5453 1.184 -1.177 ... 0.3274 -1.583
Value2 (Letter, Number) float64 -0.4184 -0.3325 0.6826 ... 0.07381 0.4357
This works very well if the data is not too big, but this seems stupid for me because it will load it into memory when I convert to DataFrame. I would like to find a way to do the same thing faster and lighter using Xarray only.
To help to reproduce the same problem, I made a code here below just to create a data similar to the one I have after reading the NetCDF file.
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['Letter'] = 'A A A B B B C C C'.split()
df['Number'] = [1,2,3,1,2,3,1,2,3]
df['Value1'] = np.random.randn(9)
df['Value2'] = np.random.randn(9)
data = df.to_xarray()
Upvotes: 3
Views: 9711
Reputation: 3407
You should be able to do this using the code below. You cannot remove dimensions in xarray, so you will have to replace the values of "index" with the values of Letter or Number first, and then rename the index dimension.
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['Letter'] = 'A A A B B B C C C'.split()
df['Number'] = [1,2,3,1,2,3,1,2,3]
df['Value1'] = np.random.randn(9)
df['Value2'] = np.random.randn(9)
data = df.to_xarray()
(
data
.assign_coords({"index": data.Letter.values})
.assign_coords({"Number":data.Number.values})
.drop("Letter")
.rename_dims({"index":"Letter"})
.rename({"index":"Letter"})
)
Upvotes: 1