numpy masked array fill value still being accessed

Question

I am trying to process an image as a masked array to handle NoData areas. I decided to do a little testing first on one dimensional arrays, and am seeing something odd. here is my test code:

    a = np.array([0,1,4,3,4,-9999,33,34,-9999])
    am = np.ma.MaskedArray(a)
    am.mask = (am==-9999)

    z = np.arange(35)

    z[am]

I would expect that indexing the z array with the masked array would succeed but I am seeing the following error:

    Runtime error 
    Traceback (most recent call last):
      File "", line 1, in 
    IndexError: index -9999 is out of bounds for size 35

can anyone comment on how this would be correctly coded? I can run the following command with success:

    z[a[a>0]]

which is effectively the same thing.

Thanks!

ely · Accepted Answer

It's generally a bad idea to use marked arrays for purposes of indexing, precisely because the behavior that should happen at a masked value is undefined.

Think about it this way: when I look at your array a and your array z, I can say "Ok, a[0] = 0 so z[a[0]] makes sense." And so on until I come across a[5] = -9999 when I can say, "OK, that can't make sense as an index for z" and an exception can be raised.

This is in fact what will happen when you naively use am as an index set ... it reverts to using am.data which contains all of the original values. If instead it tried to use something like [z[i] for i in am] you would run smack into the problem of encountering numpy.ma.core.MaskedConstant which is not a sensible value for indexing -- not for fetching a value nor for ignoring the request to fetch a value.

In [39]: l = [x for x in am]

In [40]: l
Out[40]: [0, 1, 4, 3, 4, masked, 33, 34, masked]

In [41]: type(l[-1])
Out[41]: numpy.ma.core.MaskedConstant

(In fact, if you try to index on one of these guys, you get IndexError: arrays used as indices must be of integer (or boolean) type).

But now what happens if I come across the masked value in am.filled()? The entry at the 5th index of am.filled() won't be an instance of numpy.ma.core.MaskedConstant -- it will be whatever fill value has been selected by you. If that fill value makes sense as an index, well then you will actually fetch a value by indexing at that index. Take 0 as an example. It seems like an innocuous, neutral fill value, but actually it represents a valid index, so you get two extra accesses to the 0th entry of z:

In [42]: am.fill_value = 0

In [43]: z[am.filled()]
Out[43]: array([ 0,  1,  4,  3,  4,  0, 33, 34,  0])

and this isn't exactly what the mask is supposed to do either!

A half-baked approach is to iterate over am and exclude anything with type of np.ma.core.MaskedConstant:

In [45]: z[np.array([x for x in am if type(x) is not np.ma.core.MaskedConstant])]
Out[45]: array([ 0,  1,  4,  3,  4, 33, 34])

But really a much clearer expression of all of this is to just use plain logical indexing in the first place:

In [47]: z[a[a != -9999]]
Out[47]: array([ 0,  1,  4,  3,  4, 33, 34])

Note that logical indexing like this will work fine for 2D arrays, as long as you're willing to accept that once a higher dimensional array is indexed logically, if the result is no longer conformable to the same regular 2D shape, then it will be presented in 1D, like this:

In [58]: a2 = np.array([[10, -9999, 13], [-9999, 1, 8], [1, 8, 1]])

In [59]: a2
Out[59]: 
array([[   10, -9999,    13],
       [-9999,     1,     8],
       [    1,     8,     1]])

In [60]: z2 = np.random.rand(3,3)

In [61]: z2[np.where(a2 != -9999)]
Out[61]: 
array([ 0.4739082 ,  0.13629442,  0.46547732,  0.87674102,  0.08753297,
        0.57109764,  0.39722408])

If instead you want something similar to the effect of a mask, you can just set values equal to NaN (for float arrays):

In [66]: a2 = np.array([[10, -9999, 13], [-9999, 1, 8], [1, 8, 1]], dtype=np.float)

In [67]: a2
Out[67]: 
array([[  1.00000000e+01,  -9.99900000e+03,   1.30000000e+01],
       [ -9.99900000e+03,   1.00000000e+00,   8.00000000e+00],
       [  1.00000000e+00,   8.00000000e+00,   1.00000000e+00]])

In [68]: a2[np.where(a2 == -9999)] = np.NaN

In [69]: a2
Out[69]: 
array([[ 10.,  nan,  13.],
       [ nan,   1.,   8.],
       [  1.,   8.,   1.]])

This form of masking with NaN is suitable for a lot of vectorized array computations in NumPy, although it can be a pain to worry about converting integer-based image data to floating point first, and converting back safely at the end.

numpy masked array fill value still being accessed

Answers (2)

Related Questions