Reputation: 139
I have an Xarray DataArray with values over rectangular 2D grid, and a list of points (pairs of coordinate values) from an arbitrary subset of that grid contained in a pandas Dataframe.
How do I mask out values (i.e. set equal to NaN) in the DataArray whose grid coordinates do not appear in the list points?
e.g. consider the DataArray
In [35]: da = xr.DataArray(data=np.random.randint(10, size=(5, 6)), coords={"x": np.linspace(0, 10, 5), "y": np.linspace(0, 12, 6)})
In [36]: da
Out[36]:
<xarray.DataArray (x: 5, y: 6)>
array([[6, 0, 2, 3, 9, 8],
[7, 6, 4, 8, 5, 8],
[7, 4, 4, 5, 4, 7],
[9, 8, 8, 1, 8, 0],
[8, 9, 4, 3, 3, 6]])
Coordinates:
* x (x) float64 0.0 2.5 5.0 7.5 10.0
* y (y) float64 0.0 2.4 4.8 7.2 9.6 12.0
and dataframe
In [44]: coords = pd.DataFrame([[2.5, 4.8], [2.5, 7.2], [5.0, 12.0], [7.5, 7.2], [10.0, 2.4]], columns=["x_coord", "y_coord"])
In [45]: coords
Out[45]:
x_coord y_coord
0 2.5 4.8
1 2.5 7.2
2 5.0 12.0
3 7.5 7.2
4 10.0 2.4
then I expect the output to be:
Out[84]:
<xarray.DataArray (x: 5, y: 6)>
array([[nan, nan, nan, nan, nan, nan],
[nan, nan, 4., 8., nan, nan],
[nan, nan, nan, nan, nan, 7.],
[nan, nan, nan, 1., nan, nan],
[ 8., nan, nan, nan, nan, nan]])
Coordinates:
* x (x) float64 0.0 2.5 5.0 7.5 10.0
* y (y) float64 0.0 2.4 4.8 7.2 9.6 12.0
Upvotes: 1
Views: 631
Reputation: 15452
You can convert the dataframe to an xarray object by setting the x and y coordinates as the index, then using to_xarray
. since you don't have any data left, I'll just assign a "flag" variable:
In [20]: flag = (
...: coords.assign(flag=1)
...: .set_index(["x_coord", "y_coord"])
...: .flag
...: .to_xarray()
...: .fillna(0)
...: .rename({"x_coord": "x", "y_coord": "y"})
...: )
In [21]: flag
Out[21]:
<xarray.DataArray 'flag' (x: 4, y: 4)>
array([[0., 1., 1., 0.],
[0., 0., 0., 1.],
[0., 0., 1., 0.],
[1., 0., 0., 0.]])
Coordinates:
* x (x) float64 2.5 5.0 7.5 10.0
* y (y) float64 2.4 4.8 7.2 12.0
To deal with floating point issues, I'll reindex the array to ensure the dims are consistent with the arrays:
In [22]: flag = flag.reindex(x=da.x, y=da.y, method="nearest", tolerance=1e-9, fill_value=0)
In [23]: flag
Out[23]:
<xarray.DataArray 'flag' (x: 5, y: 6)>
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 0., 0.],
[0., 0., 0., 0., 0., 1.],
[0., 0., 0., 1., 0., 0.],
[0., 1., 0., 0., 0., 0.]])
Coordinates:
* x (x) float64 0.0 2.5 5.0 7.5 10.0
* y (y) float64 0.0 2.4 4.8 7.2 9.6 12.0
This is now the same shape as your array and can can be used as a mask:
In [24]: da.where(flag)
Out[24]:
<xarray.DataArray (x: 5, y: 6)>
array([[nan, nan, nan, nan, nan, nan],
[nan, nan, 7., 0., nan, nan],
[nan, nan, nan, nan, nan, 5.],
[nan, nan, nan, 8., nan, nan],
[nan, 8., nan, nan, nan, nan]])
Coordinates:
* x (x) float64 0.0 2.5 5.0 7.5 10.0
* y (y) float64 0.0 2.4 4.8 7.2 9.6 12.0
Just in case it's useful, if you wanted to do the opposite; that is, extract the values from the DataArray at the points given in your dataframe, you could use xarray's advanced indexing rules to pull specific points out of the array using DataArray indexers:
In [28]: da.sel(
...: x=coords.x_coord.to_xarray(),
...: y=coords.y_coord.to_xarray(),
...: method="nearest",
...: tolerance=1e-9, # use a (low) tolerance to handle floating-point error
...: )
Out[28]:
<xarray.DataArray (index: 5)>
array([7, 0, 5, 8, 8])
Coordinates:
x (index) float64 2.5 2.5 5.0 7.5 10.0
y (index) float64 4.8 7.2 12.0 7.2 2.4
* index (index) int64 0 1 2 3 4
Upvotes: 2