Reputation: 757
I have a 2D array that I load from a NetCDF file using xarray
, and I want to do a sort of edge detection by comparing cell values to neighbouring cell values. I've come up with this code:
import numpy as np
import xarray as xr
d = xr.open_dataset('https://thredds.met.no/thredds/dodsC/barents25km_files/Barents-2.5km_ZDEPTHS_his.an.2021112900.nc')
# Compare values to neighbouring cells
threshold = 0.7
mask = (d.ice_concentration[0,1:-1,1:-1] < threshold) & (d.ice_concentration[0,1:-1,:-2] > threshold)
# Check shapes
print((d.ice_concentration[0,1:-1,1:-1] < threshold).shape)
print((d.ice_concentration[0,1:-1, :-2] > threshold).shape)
print(mask.shape)
Output:
(947, 737)
(947, 737)
(947, 736)
So my problem here is that I use &
to take an elementwise and
on two arrays of equal shape, and yet the result has a different shape. I assume that some magical stuff to do with the dimensions is happening behind the scenes, such that each cell is compared to itself, and not to the neighbouring cell. This is supported by the fact that np.sum(mask)
returns zero.
If I add .values
in a couple of places, I get the correct shape, and I get the result I expect where np.sum(mask)
is a number larger than zero:
# Compare values to neighbouring cells
threshold = 0.7
mask = (d.ice_concentration[0,1:-1,1:-1].values < threshold) & (d.ice_concentration[0,1:-1,:-2].values > threshold)
# Check shapes
print((d.ice_concentration[0,1:-1,1:-1] < threshold).shape)
print((d.ice_concentration[0,1:-1, :-2] > threshold).shape)
print(mask.shape)
Output:
(947, 737)
(947, 737)
(947, 737)
Is this intentional behaviour from xarray
? I suppose this behaviour might make sense if I was trying to compare different arrays with the same dimensions, but in this case it makes zero sense (at least to me). Or am I doing something wrong?
Upvotes: 1
Views: 751
Reputation: 15432
xarray uses labels along a given dimension to align the data when performing data operations. in this way, it is more similar to pandas than numpy in its handling of mis-aligned coordinates, though by default xarray always aligns data using an inner join:
In [1]: import xarray as xr, pandas as pd, numpy as np
In [2]: da = xr.DataArray(np.arange(5), dims=['level'], coords=[list('abcde')])
In [3]: da
Out[3]:
<xarray.DataArray (level: 5)>
array([0, 1, 2, 3, 4])
Coordinates:
* level (level) <U1 'a' 'b' 'c' 'd' 'e'
When looking at the two cases in your example, you're slicing the data using an offset (slice(0, length-1)
vs slice(1, length)
). When you do this, notice that the indices of "level" are no longer aligned:
In [4]: a = da[:len(da.level)-1]
...: b = da[1:len(da.level)]
In [5]: a
Out[5]:
<xarray.DataArray (level: 4)>
array([0, 1, 2, 3])
Coordinates:
* level (level) <U1 'a' 'b' 'c' 'd'
In [6]: b
Out[6]:
<xarray.DataArray (level: 4)>
array([1, 2, 3, 4])
Coordinates:
* level (level) <U1 'b' 'c' 'd' 'e'
When adding the two, or in any operation that involves broadcasting & automatic alignment (see the docs references below), missing values will be dropped from the result. Furthermore, in the results, note that the sum is the elementwise-sum where each element is aligned based on the label (b + b, c + c, d + d), not based on position (b + a, c + b, d + c).
In [7]: a + b
Out[7]:
<xarray.DataArray (level: 3)>
array([2, 4, 6])
Coordinates:
* level (level) <U1 'b' 'c' 'd'
What you're looking for is to first use the shift
method to shift the axis labels, so that the coordinate labels are aligned before you do the addition:
In [10]: c = da.shift(level=-1)
In [11]: c
Out[11]:
<xarray.DataArray (level: 5)>
array([ 1., 2., 3., 4., nan])
Coordinates:
* level (level) <U1 'a' 'b' 'c' 'd' 'e'
In [12]: a + c
Out[12]:
<xarray.DataArray (level: 4)>
array([1., 3., 5., 7.])
Coordinates:
* level (level) <U1 'a' 'b' 'c' 'd'
Now, when you add across a + c
the coordinates line up in the way you'd like.
For more information, see the xarray Computation docs on broadcasting by dimension name and automatic alignment.
Upvotes: 1