ogb119
ogb119

Reputation: 139

xarray mask outside list of coordinates

I have an Xarray DataArray with values over rectangular 2D grid, and a list of points (pairs of coordinate values) from an arbitrary subset of that grid contained in a pandas Dataframe.

How do I mask out values (i.e. set equal to NaN) in the DataArray whose grid coordinates do not appear in the list points?

e.g. consider the DataArray

In [35]: da = xr.DataArray(data=np.random.randint(10, size=(5, 6)), coords={"x": np.linspace(0, 10, 5), "y": np.linspace(0, 12, 6)})

In [36]: da
Out[36]: 
<xarray.DataArray (x: 5, y: 6)>
array([[6, 0, 2, 3, 9, 8],
       [7, 6, 4, 8, 5, 8],
       [7, 4, 4, 5, 4, 7],
       [9, 8, 8, 1, 8, 0],
       [8, 9, 4, 3, 3, 6]])
Coordinates:
  * x        (x) float64 0.0 2.5 5.0 7.5 10.0
  * y        (y) float64 0.0 2.4 4.8 7.2 9.6 12.0

and dataframe

In [44]: coords = pd.DataFrame([[2.5, 4.8], [2.5, 7.2], [5.0, 12.0], [7.5, 7.2], [10.0, 2.4]], columns=["x_coord", "y_coord"])

In [45]: coords
Out[45]: 
   x_coord   y_coord
0      2.5       4.8
1      2.5       7.2
2      5.0      12.0
3      7.5       7.2
4     10.0       2.4

then I expect the output to be:

Out[84]: 
<xarray.DataArray (x: 5, y: 6)>
array([[nan, nan, nan, nan, nan, nan],
       [nan, nan,  4.,  8., nan, nan],
       [nan, nan, nan, nan, nan,  7.],
       [nan, nan, nan,  1., nan, nan],
       [ 8., nan, nan, nan, nan, nan]])
Coordinates:
  * x        (x) float64 0.0 2.5 5.0 7.5 10.0
  * y        (y) float64 0.0 2.4 4.8 7.2 9.6 12.0

Upvotes: 1

Views: 631

Answers (1)

Michael Delgado
Michael Delgado

Reputation: 15452

You can convert the dataframe to an xarray object by setting the x and y coordinates as the index, then using to_xarray. since you don't have any data left, I'll just assign a "flag" variable:

In [20]: flag = (
    ...:     coords.assign(flag=1)
    ...:     .set_index(["x_coord", "y_coord"])
    ...:     .flag
    ...:     .to_xarray()
    ...:     .fillna(0)
    ...:     .rename({"x_coord": "x", "y_coord": "y"})
    ...: )

In [21]: flag
Out[21]:
<xarray.DataArray 'flag' (x: 4, y: 4)>
array([[0., 1., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 1., 0.],
       [1., 0., 0., 0.]])
Coordinates:
  * x        (x) float64 2.5 5.0 7.5 10.0
  * y        (y) float64 2.4 4.8 7.2 12.0

To deal with floating point issues, I'll reindex the array to ensure the dims are consistent with the arrays:

In [22]: flag = flag.reindex(x=da.x, y=da.y, method="nearest", tolerance=1e-9, fill_value=0)

In [23]: flag
Out[23]:
<xarray.DataArray 'flag' (x: 5, y: 6)>
array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 1., 0., 0.],
       [0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0., 0.]])
Coordinates:
  * x        (x) float64 0.0 2.5 5.0 7.5 10.0
  * y        (y) float64 0.0 2.4 4.8 7.2 9.6 12.0

This is now the same shape as your array and can can be used as a mask:

In [24]: da.where(flag)
Out[24]:
<xarray.DataArray (x: 5, y: 6)>
array([[nan, nan, nan, nan, nan, nan],
       [nan, nan,  7.,  0., nan, nan],
       [nan, nan, nan, nan, nan,  5.],
       [nan, nan, nan,  8., nan, nan],
       [nan,  8., nan, nan, nan, nan]])
Coordinates:
  * x        (x) float64 0.0 2.5 5.0 7.5 10.0
  * y        (y) float64 0.0 2.4 4.8 7.2 9.6 12.0

Just in case it's useful, if you wanted to do the opposite; that is, extract the values from the DataArray at the points given in your dataframe, you could use xarray's advanced indexing rules to pull specific points out of the array using DataArray indexers:

In [28]: da.sel(
    ...:     x=coords.x_coord.to_xarray(),
    ...:     y=coords.y_coord.to_xarray(),
    ...:     method="nearest",
    ...:     tolerance=1e-9, # use a (low) tolerance to handle floating-point error
    ...: )

Out[28]:
<xarray.DataArray (index: 5)>
array([7, 0, 5, 8, 8])
Coordinates:
    x        (index) float64 2.5 2.5 5.0 7.5 10.0
    y        (index) float64 4.8 7.2 12.0 7.2 2.4
  * index    (index) int64 0 1 2 3 4

Upvotes: 2

Related Questions