Reputation: 1066
I am facing serious difficulties in understanding how the xarray.groupby really works. I am trying to apply a given function "f" over each group of a xarray DatasetGroupBy collection, such that "f" should add new variables to each of the applied groups of the original xr.DataSet.
My problem is commonly found in geoscience, remote sensing, etc.
The aim is to apply a given function over an Array, pixel by pixel (or gridcell by gridcell).
Let's assume that I want to evaluate the wind speed components (u,v) of a wind-field for a given region in respect to a new direction. Therefore, I whish to evaluate rotated version of the 'u' and 'v components, namely: u_rotated and v_rotated.
Let's assume that this new direction is 30° rotated anti-clockwise in respect to each pixel position in the wind-field. So the new wind components would be (u_30_degrees and v_30_degrees).
My first attempt was to stack each of the x and y coordinates (or longitudes and latitudes) into a new dimension called pixel, and later groupby by this new dimension ("pixel") and apply a function which would do the vector-wind rotation.
Here is a snippet of my initial attempt:
# First, let's create some functions for vector rotation:
def rotate_2D_vector_per_given_degrees(array2D, angle=30):
'''
Parameters
----------
array2D : 1D length 2 numpy array
angle : float angle in degrees (optional)
DESCRIPTION. The default is 30.
Returns
-------
Rotated_2D_Vector : 1D of length 2 numpy array
'''
R = get_rotation_matrix(rotation = angle)
Rotated_2D_Vector = np.dot(R, array2D)
return Rotated_2D_Vector
def get_rotation_matrix(rotation=90):
'''
Description:
This function creates a rotation matrix given a defined rotation angle (in degrees)
Parameters:
rotation: in degrees
Returns:
rotation matrix
'''
theta = np.radians(rotation) # degrees
c, s = np.cos(theta), np.sin(theta)
R = np.array(((c, -s), (s, c)))
return R
# Then let's create a reproducible dataset for analysis:
u_wind = xr.DataArray(np.ones( shape=(20, 30)),
dims=('x', 'y'),
coords={'x': np.arange(0, 20),
'y': np.arange(0, 30)},
name='u')
v_wind = xr.DataArray(np.ones( shape=(20, 30))*0.3,
dims=('x', 'y'),
coords={'x': np.arange(0, 20),
'y': np.arange(0, 30)},
name='v')
data = xr.merge([u_wind, v_wind])
# Let's create the given function that will be applied per each group in the dataset:
def rotate_wind(array, degrees=30):
# This next line, I create a 1-dimension vector of length 2,
# with wind speed of the u and v components, respectively.
# The best solution I found has been conver the dataset into a single xr.DataArray
# by stacking the 'u' and 'v' components into a single variable named 'wind'.
vector = array.to_array(dim='wind').values
# Now, I rotate the wind vector given a rotation angle in degrees
Rotated = rotate_2D_vector_per_given_degrees(vector, degrees)
# Ensuring numerical division problems as 1e-17 == 0.
Rotated = np.where( np.abs(Rotated - 6.123234e-15) < 1e-15, 0, Rotated)
# sanity check for each point position:
print('Coords: ', array['point'].values,
'Wind Speed: ', vector,
'Response :', Rotated,
end='\n\n'+'-'*20+'\n')
components = [a for a in data.variables if a not in data.dims]
for dim, value in zip(components, Rotated):
array['{0}_rotated_{1}'.format(dim, degrees)] = value
return array
# Finally, lets stack our dataset per grid-point, groupby this new dimension, and apply the desired function:
stacked = data.stack(point = ['x', 'y'])
stacked = stacked.groupby('point').apply(rotate_wind)
# lets unstack the data to recover the original dataset:
data = stacked.unstack('point')
# Let's check if the function worked correctly
data.to_dataframe().head(30)
Though the above example is apparently working, I am still unsure if its results are correct, or even if the groupby-apply function implementation is efficient (clean, non-redundant, fast, etc.).
Any insights are most welcome!
Sincerely,
Upvotes: 3
Views: 2820
Reputation: 3542
You can merely multiply the whole array by the rotation matrice, something like np.dot(R, da)
.
So, if you have the following Dataset
:
>>> dims = ("x", "y")
>>> sizes = (20, 30)
>>> ds = xr.Dataset(
data_vars=dict(u=(dims, np.ones(sizes)), v=(dims, np.ones(sizes) * 0.3)),
coords={d: np.arange(s) for d, s in zip(dims, sizes)},
)
>>> ds
<xarray.Dataset>
Dimensions: (x: 20, y: 30)
Coordinates:
* x (x) int64 0 1 2 3 4 ... 16 17 18 19
* y (y) int64 0 1 2 3 4 ... 26 27 28 29
Data variables:
u (x, y) float64 1.0 1.0 ... 1.0 1.0
v (x, y) float64 0.3 0.3 ... 0.3 0.3
Converted, like you did, to the following DataArray
:
>>> da = ds.stack(point=["x", "y"]).to_array(dim="wind")
>>> da
<xarray.DataArray (wind: 2, point: 600)>
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3]])
Coordinates:
* point (point) MultiIndex
- x (point) int64 0 0 0 0 ... 19 19 19 19
- y (point) int64 0 1 2 3 ... 26 27 28 29
* wind (wind) <U1 'u' 'v'
Then, you have your rotated values thanks to np.dot(R, da)
:
>>> np.dot(R, da).shape
(2, 600)
>>> type(np.dot(R, da))
<class 'numpy.ndarray'>
But it is a numpy ndarray
. So if you want to go back to a xarray DataArray
, you can use a trick like that (other solutions may exist):
def rotate(da, dim, angle):
# Put dim first
dims_orig = da.dims
da = da.transpose(dim, ...)
# Rotate
R = rotation_matrix(angle)
rotated = da.copy(data=np.dot(R, da), deep=True)
# Rename values of "dim" coord according to rotation
rotated[dim] = [f"{orig}_rotated_{angle}" for orig in da[dim].values]
# Transpose back to orig
return rotated.transpose(*dims_orig)
And use it like:
>>> da_rotated = rotate(da, dim="wind", angle=30)
>>> da_rotated
<xarray.DataArray (wind: 2, point: 600)>
array([[0.7160254 , 0.7160254 , 0.7160254 , ..., 0.7160254 , 0.7160254 ,
0.7160254 ],
[0.75980762, 0.75980762, 0.75980762, ..., 0.75980762, 0.75980762,
0.75980762]])
Coordinates:
* point (point) MultiIndex
- x (point) int64 0 0 0 0 ... 19 19 19 19
- y (point) int64 0 1 2 3 ... 26 27 28 29
* wind (wind) <U12 'u_rotated_30' 'v_rotated_30'
Eventually, you can go back to the original Dataset
structure like that:
>>> ds_rotated = da_rotated.to_dataset(dim="wind").unstack(dim="point")
>>> ds_rotated
<xarray.Dataset>
Dimensions: (x: 20, y: 30)
Coordinates:
* x (x) int64 0 1 2 3 ... 17 18 19
* y (y) int64 0 1 2 3 ... 27 28 29
Data variables:
u_rotated_30 (x, y) float64 0.716 ... 0.716
v_rotated_30 (x, y) float64 0.7598 ... 0.7598
Upvotes: 3