Reputation: 8450
I need to apply a very simple 'match statement' to the values in an xarray array:
NaN
, make NaN
Here's my current solution. I'm using NaN
s, .fillna
, & type coercion in lieu of 2d indexing.
valid = date_by_items.notnull()
positive = date_by_items > 0
positive = positive * 2
result = positive.fillna(0.).where(valid)
result
This changes this:
In [20]: date_by_items = xr.DataArray(np.asarray((list(range(3)) * 10)).reshape(6,5), dims=('date','item'))
...: date_by_items
...:
Out[20]:
<xarray.DataArray (date: 6, item: 5)>
array([[0, 1, 2, 0, 1],
[2, 0, 1, 2, 0],
[1, 2, 0, 1, 2],
[0, 1, 2, 0, 1],
[2, 0, 1, 2, 0],
[1, 2, 0, 1, 2]])
Coordinates:
* date (date) int64 0 1 2 3 4 5
* item (item) int64 0 1 2 3 4
... to this:
Out[22]:
<xarray.DataArray (date: 6, item: 5)>
array([[ 0., 2., 2., 0., 2.],
[ 2., 0., 2., 2., 0.],
[ 2., 2., 0., 2., 2.],
[ 0., 2., 2., 0., 2.],
[ 2., 0., 2., 2., 0.],
[ 2., 2., 0., 2., 2.]])
Coordinates:
* date (date) int64 0 1 2 3 4 5
* item (item) int64 0 1 2 3 4
While in pandas df[df>0] = 2
would be enough. Surely I'm doing something pedestrian and there's an terser way?
Upvotes: 16
Views: 13897
Reputation: 1578
Another concise way would be to do date_by_items.values[date_by_items.values > 0] = 2
Upvotes: 1
Reputation: 97
You can use the where(condition, other)
method indeed. But be aware that the other
argument will be used where the condition is false. So the behavior in the other answers is incorrect, as they will put a 2 where date_by_items > 0
does not hold.
>>> date = list(range(0,6))
>>> item = list(range(0,5))
>>> date_by_items = xr.DataArray(np.asarray((list(range(3)) * 10)).reshape(6,5), coords=[date, item], dims=('date','item'))
>>> date_by_items
<xarray.DataArray (date: 6, item: 5)>
array([[0, 1, 2, 0, 1],
[2, 0, 1, 2, 0],
[1, 2, 0, 1, 2],
[0, 1, 2, 0, 1],
[2, 0, 1, 2, 0],
[1, 2, 0, 1, 2]])
Coordinates:
* date (date) int64 0 1 2 3 4 5
* item (item) int64 0 1 2 3 4
>>> date_by_items.where(date_by_items > 0, 2) # wrong behavior
<xarray.DataArray (date: 6, item: 5)>
array([[2, 1, 2, 2, 1],
[2, 2, 1, 2, 2],
[1, 2, 2, 1, 2],
[2, 1, 2, 2, 1],
[2, 2, 1, 2, 2],
[1, 2, 2, 1, 2]])
Coordinates:
* date (date) int64 0 1 2 3 4 5
* item (item) int64 0 1 2 3 4
Instead, when you want the requested behavior, you either have to invert the condition or use the xarray.where(condition, x, y)
method instead.
>>> date_by_items.where(date_by_items <= 0, 2) # inverted condition
<xarray.DataArray (date: 6, item: 5)>
array([[0, 2, 2, 0, 2],
[2, 0, 2, 2, 0],
[2, 2, 0, 2, 2],
[0, 2, 2, 0, 2],
[2, 0, 2, 2, 0],
[2, 2, 0, 2, 2]])
Coordinates:
* date (date) int64 0 1 2 3 4 5
* item (item) int64 0 1 2 3 4
>>> xarray.where(date_by_items > 0, 2, date_by_items)
<xarray.DataArray (date: 6, item: 5)>
array([[0, 2, 2, 0, 2],
[2, 0, 2, 2, 0],
[2, 2, 0, 2, 2],
[0, 2, 2, 0, 2],
[2, 0, 2, 2, 0],
[2, 2, 0, 2, 2]])
Coordinates:
* date (date) int64 0 1 2 3 4 5
* item (item) int64 0 1 2 3 4
Upvotes: 0
Reputation: 8450
xarray now supports .where(condition, other)
, so this is now valid:
result = date_by_items.where(date_by_items > 0, 2)
Upvotes: 17
Reputation: 9593
If you are happy to load your data in-memory as a NumPy array, you can modify the DataArray values in place with NumPy:
date_by_items.values[date_by_items.values > 0] = 2
The cleanest way to handle this would be if xarray supported the other
argument to where
, but we haven't implemented that yet (hopefully soon -- the groundwork has been laid!). When that works, you'll be able to write date_by_items.where(date_by_items > 0, 2)
.
Either way, you'll need to do this twice to apply both your criteria.
Upvotes: 5