Reputation: 167
I am trying to process an image as a masked array to handle NoData areas. I decided to do a little testing first on one dimensional arrays, and am seeing something odd. here is my test code:
a = np.array([0,1,4,3,4,-9999,33,34,-9999])
am = np.ma.MaskedArray(a)
am.mask = (am==-9999)
z = np.arange(35)
z[am]
I would expect that indexing the z array with the masked array would succeed but I am seeing the following error:
Runtime error
Traceback (most recent call last):
File "<string>", line 1, in <module>
IndexError: index -9999 is out of bounds for size 35
can anyone comment on how this would be correctly coded? I can run the following command with success:
z[a[a>0]]
which is effectively the same thing.
Thanks!
Upvotes: 1
Views: 2357
Reputation: 77484
It's generally a bad idea to use marked arrays for purposes of indexing, precisely because the behavior that should happen at a masked value is undefined.
Think about it this way: when I look at your array a
and your array z
, I can say "Ok, a[0] = 0
so z[a[0]]
makes sense." And so on until I come across a[5] = -9999
when I can say, "OK, that can't make sense as an index for z
" and an exception can be raised.
This is in fact what will happen when you naively use am
as an index set ... it reverts to using am.data
which contains all of the original values. If instead it tried to use something like [z[i] for i in am]
you would run smack into the problem of encountering numpy.ma.core.MaskedConstant
which is not a sensible value for indexing -- not for fetching a value nor for ignoring the request to fetch a value.
In [39]: l = [x for x in am]
In [40]: l
Out[40]: [0, 1, 4, 3, 4, masked, 33, 34, masked]
In [41]: type(l[-1])
Out[41]: numpy.ma.core.MaskedConstant
(In fact, if you try to index on one of these guys, you get IndexError: arrays used as indices must be of integer (or boolean) type
).
But now what happens if I come across the masked value in am.filled()
? The entry at the 5th index of am.filled()
won't be an instance of numpy.ma.core.MaskedConstant
-- it will be whatever fill value has been selected by you. If that fill value makes sense as an index, well then you will actually fetch a value by indexing at that index. Take 0 as an example. It seems like an innocuous, neutral fill value, but actually it represents a valid index, so you get two extra accesses to the 0th entry of z
:
In [42]: am.fill_value = 0
In [43]: z[am.filled()]
Out[43]: array([ 0, 1, 4, 3, 4, 0, 33, 34, 0])
and this isn't exactly what the mask is supposed to do either!
A half-baked approach is to iterate over am
and exclude anything with type
of np.ma.core.MaskedConstant
:
In [45]: z[np.array([x for x in am if type(x) is not np.ma.core.MaskedConstant])]
Out[45]: array([ 0, 1, 4, 3, 4, 33, 34])
But really a much clearer expression of all of this is to just use plain logical indexing in the first place:
In [47]: z[a[a != -9999]]
Out[47]: array([ 0, 1, 4, 3, 4, 33, 34])
Note that logical indexing like this will work fine for 2D arrays, as long as you're willing to accept that once a higher dimensional array is indexed logically, if the result is no longer conformable to the same regular 2D shape, then it will be presented in 1D, like this:
In [58]: a2 = np.array([[10, -9999, 13], [-9999, 1, 8], [1, 8, 1]])
In [59]: a2
Out[59]:
array([[ 10, -9999, 13],
[-9999, 1, 8],
[ 1, 8, 1]])
In [60]: z2 = np.random.rand(3,3)
In [61]: z2[np.where(a2 != -9999)]
Out[61]:
array([ 0.4739082 , 0.13629442, 0.46547732, 0.87674102, 0.08753297,
0.57109764, 0.39722408])
If instead you want something similar to the effect of a mask, you can just set values equal to NaN
(for float
arrays):
In [66]: a2 = np.array([[10, -9999, 13], [-9999, 1, 8], [1, 8, 1]], dtype=np.float)
In [67]: a2
Out[67]:
array([[ 1.00000000e+01, -9.99900000e+03, 1.30000000e+01],
[ -9.99900000e+03, 1.00000000e+00, 8.00000000e+00],
[ 1.00000000e+00, 8.00000000e+00, 1.00000000e+00]])
In [68]: a2[np.where(a2 == -9999)] = np.NaN
In [69]: a2
Out[69]:
array([[ 10., nan, 13.],
[ nan, 1., 8.],
[ 1., 8., 1.]])
This form of masking with NaN
is suitable for a lot of vectorized array computations in NumPy, although it can be a pain to worry about converting integer-based image data to floating point first, and converting back safely at the end.
Upvotes: 2
Reputation: 1381
Try this code
a = np.array([0,1,4,3,4,-9999,33,34,-9999])
am = np.ma.MaskedArray(a)
am.mask = (am==-9999)
np.ma.set_fill_value(am, 0)
z = np.arange(35)
print z[am.filled()]
accessing am
gives the masked array where masked value refers to the original values(it is just a reference to the original array).Calling am.filled()
after setting fill_value replaces the masked elements with the fill_value in the array returned by am.filled
Upvotes: 1