Maximilian
Maximilian

Reputation: 8450

Concise way to filter data in xarray

I need to apply a very simple 'match statement' to the values in an xarray array:

  1. Where the value > 0, make 2
  2. Where the value == 0, make 0
  3. Where the value is NaN, make NaN

Here's my current solution. I'm using NaNs, .fillna, & type coercion in lieu of 2d indexing.

valid = date_by_items.notnull()
positive = date_by_items > 0
positive = positive * 2
result = positive.fillna(0.).where(valid)
result

This changes this:

In [20]: date_by_items = xr.DataArray(np.asarray((list(range(3)) * 10)).reshape(6,5), dims=('date','item'))
    ...: date_by_items
    ...: 
Out[20]: 
<xarray.DataArray (date: 6, item: 5)>
array([[0, 1, 2, 0, 1],
       [2, 0, 1, 2, 0],
       [1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1],
       [2, 0, 1, 2, 0],
       [1, 2, 0, 1, 2]])
Coordinates:
  * date     (date) int64 0 1 2 3 4 5
  * item     (item) int64 0 1 2 3 4

... to this:

Out[22]: 
<xarray.DataArray (date: 6, item: 5)>
array([[ 0.,  2.,  2.,  0.,  2.],
       [ 2.,  0.,  2.,  2.,  0.],
       [ 2.,  2.,  0.,  2.,  2.],
       [ 0.,  2.,  2.,  0.,  2.],
       [ 2.,  0.,  2.,  2.,  0.],
       [ 2.,  2.,  0.,  2.,  2.]])
Coordinates:
  * date     (date) int64 0 1 2 3 4 5
  * item     (item) int64 0 1 2 3 4

While in pandas df[df>0] = 2 would be enough. Surely I'm doing something pedestrian and there's an terser way?

Upvotes: 16

Views: 13897

Answers (4)

Dmitry Deryabin
Dmitry Deryabin

Reputation: 1578

Another concise way would be to do date_by_items.values[date_by_items.values > 0] = 2

Upvotes: 1

Stijn
Stijn

Reputation: 97

You can use the where(condition, other) method indeed. But be aware that the other argument will be used where the condition is false. So the behavior in the other answers is incorrect, as they will put a 2 where date_by_items > 0 does not hold.

>>> date = list(range(0,6))
>>> item = list(range(0,5))
>>> date_by_items = xr.DataArray(np.asarray((list(range(3)) * 10)).reshape(6,5), coords=[date, item], dims=('date','item'))
>>> date_by_items
<xarray.DataArray (date: 6, item: 5)>
array([[0, 1, 2, 0, 1],
       [2, 0, 1, 2, 0],
       [1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1],
       [2, 0, 1, 2, 0],
       [1, 2, 0, 1, 2]])
Coordinates:
  * date     (date) int64 0 1 2 3 4 5
  * item     (item) int64 0 1 2 3 4


>>> date_by_items.where(date_by_items > 0, 2)  # wrong behavior
<xarray.DataArray (date: 6, item: 5)>
array([[2, 1, 2, 2, 1],
       [2, 2, 1, 2, 2],
       [1, 2, 2, 1, 2],
       [2, 1, 2, 2, 1],
       [2, 2, 1, 2, 2],
       [1, 2, 2, 1, 2]])
Coordinates:
  * date     (date) int64 0 1 2 3 4 5
  * item     (item) int64 0 1 2 3 4

Instead, when you want the requested behavior, you either have to invert the condition or use the xarray.where(condition, x, y) method instead.

>>> date_by_items.where(date_by_items <= 0, 2)  # inverted condition
<xarray.DataArray (date: 6, item: 5)>
array([[0, 2, 2, 0, 2],
       [2, 0, 2, 2, 0],
       [2, 2, 0, 2, 2],
       [0, 2, 2, 0, 2],
       [2, 0, 2, 2, 0],
       [2, 2, 0, 2, 2]])
Coordinates:
  * date     (date) int64 0 1 2 3 4 5
  * item     (item) int64 0 1 2 3 4

>>> xarray.where(date_by_items > 0, 2, date_by_items)
<xarray.DataArray (date: 6, item: 5)>
array([[0, 2, 2, 0, 2],
       [2, 0, 2, 2, 0],
       [2, 2, 0, 2, 2],
       [0, 2, 2, 0, 2],
       [2, 0, 2, 2, 0],
       [2, 2, 0, 2, 2]])
Coordinates:
  * date     (date) int64 0 1 2 3 4 5
  * item     (item) int64 0 1 2 3 4

Upvotes: 0

Maximilian
Maximilian

Reputation: 8450

xarray now supports .where(condition, other), so this is now valid:

result = date_by_items.where(date_by_items > 0, 2)

Upvotes: 17

shoyer
shoyer

Reputation: 9593

If you are happy to load your data in-memory as a NumPy array, you can modify the DataArray values in place with NumPy:

date_by_items.values[date_by_items.values > 0] = 2

The cleanest way to handle this would be if xarray supported the other argument to where, but we haven't implemented that yet (hopefully soon -- the groundwork has been laid!). When that works, you'll be able to write date_by_items.where(date_by_items > 0, 2).

Either way, you'll need to do this twice to apply both your criteria.

Upvotes: 5

Related Questions