luator
luator

Reputation: 5027

NumPy masked array not considering fill_value when comparing to scalar

I have a masked numpy array like the following:

mar = np.ma.array([0, 0, 100, 100], mask=[False, True, True, False], fill_value=-1)

So the two values in the middle are masked, calling mar.filled() would return [0, -1, -1, 100].

I want to compare this array to a scalar 0, i.e.:

mar == 0

which returns

masked_array(data = [True -- -- False],
             mask = [False  True  True False],
       fill_value = True)

Note that the fill_value is now True which is the default fill value for bool arrays but does not make sense for me in this case (I would have expected that it is set to -1 == 0 which is False).

To illustrate my problem more clearly: (mar == 0).filled() and mar.filled() == 0 do not return the same result.

Is this intended behaviour or is it a bug? In any case, is there a workaround to achieve my desired behaviour? I know that I can just convert to a normal array before comparison using .filled() but I would like to avoid that if possible, since the code should not care whether it is a masked array or a normal one.

Upvotes: 0

Views: 990

Answers (2)

hpaulj
hpaulj

Reputation: 231605

mar == 0 uses mar.__eq__(0)

docs for that method say:

When either of the elements is masked, the result is masked as well, but the underlying boolean data are still set, with self and other considered equal if both are masked, and unequal otherwise.

That method in turn uses mar._comparison

This first performs the comparison on the .data attributes

In [16]: mar.data
Out[16]: array([  0,   0, 100, 100])
In [17]: mar.data == 0
Out[17]: array([ True,  True, False, False])

But then it compares the masks and adjusts values. 0 is not masked, so its 'mask' is False. Since the mask for the masked elements of mar is True, the masks don't match, and the comparison .data is set to False.

In [19]: np.ma.getmask(0)
Out[19]: False
In [20]: mar.mask
Out[20]: array([False,  True,  True, False])
In [21]: (mar==0).data
Out[21]: array([ True, False, False, False])

I get a different fill_value in the comparison. That could be a change in v 1.14.0.

In [24]: mar==0
Out[24]: 
masked_array(data=[True, --, --, False],
             mask=[False,  True,  True, False],
       fill_value=-1)
In [27]: (mar==0).filled()
Out[27]: array([True, -1, -1, False], dtype=object)

This is confusing. Comparisons (and in general most functions) on masked arrays have to deal with the .data, the mask, and the fill. Numpy code that isn't ma aware usually works the .data and ignores the masking. ma methods may work with the filled() values, or the compressed. This comparison method attempts to take all 3 attributes into account.


Testing the equality with a masked 0 array (same mask and fillvalue):

In [34]: mar0 = np.ma.array([0, 0, 0, 0], mask=[False, True, True, False], fill_
    ...: value=-1)
In [35]: mar0
Out[35]: 
masked_array(data=[0, --, --, 0],
             mask=[False,  True,  True, False],
       fill_value=-1)
In [36]: mar == mar0
Out[36]: 
masked_array(data=[True, --, --, False],
             mask=[False,  True,  True, False],
       fill_value=-1)
In [37]: _.data
Out[37]: array([ True,  True,  True, False])

mar == 0 is the same as mar == np.ma.array([0, 0, 0, 0], mask=False)

Upvotes: 3

Aguy
Aguy

Reputation: 8059

I don't know why (mar == 0) does not yield the desired output. But you can consider

np.equal(mar, 0)

which retain the original fill value.

Upvotes: 0

Related Questions