Till Hoffmann
Till Hoffmann

Reputation: 9877

Avoid numpy distributing an operation for overloaded operator

By default, numpy distributes operations across arrays if it doesn't know the type of the other object. This works well in most cases. For example, the following behaves as expected.

np.arange(5) + 5 # = [5, 6, 7, 8, 9]

I would like to define a class that overrides the addition operator as illustrated in the code below.

class Example:
    def __init__(self, value):
        self.value = value

    def __add__(self, other):
        return other + self.value

    def __radd__(self, other):
        return other + self.value

It works well for scalar values. For example,

np.arange(5) + Example(5) # = [5, 6, 7, 8, 9]

However, it doesn't quite do what I want for vector values. For example,

np.arange(5) + Example(np.arange(5)) 

yields the output

array([array([0, 1, 2, 3, 4]), array([1, 2, 3, 4, 5]),
   array([2, 3, 4, 5, 6]), array([3, 4, 5, 6, 7]),
   array([4, 5, 6, 7, 8])], dtype=object)

because the __add__ operator of the preceding numpy array takes priority over the __radd__ operator that I have defined. Numpy's __add__ operator calls __radd__ for each element of the numpy array yielding an array of arrays. How can I avoid numpy distributing the operation? I would like to avoid subclassing numpy arrays.

Upvotes: 3

Views: 358

Answers (1)

MSeifert
MSeifert

Reputation: 152667

For every np.ndarray and subclasses that are not too eager (for example in earlier numpy versions the np.ma.MaskedArray ignored it) you can define __array_priority__ even if you don't subclass np.ndarray directly.

The thinking behind this is simple: The subclass with the higher priority decides which operator defines the mathematical operation and not the order of the operation.

A working example with you Example would be:

class Example:

    # Define this priority
    __array_priority__ = 2

    def __init__(self, value):
        self.value = value

    def __add__(self, other):
        return other + self.value

    def __radd__(self, other):
        return other + self.value


import numpy as np
np.arange(5) + Example(np.arange(5)) 
# returns array([0, 2, 4, 6, 8])

So it works as wanted. But notice that there are some subtle problems when relying on this approach:

It doesn't work with MaskedArrays because these have a priority of 15 (so you would need to alter your priority to 16+ to make it work):

import numpy as np
np.ma.array(np.arange(5)) + Example(np.arange(5)) 

# returns:
masked_array(data = [array([0, 1, 2, 3, 4]) array([1, 2, 3, 4, 5])    array([2, 3, 4, 5, 6])
array([3, 4, 5, 6, 7]) array([4, 5, 6, 7, 8])],
         mask = False,
   fill_value = ?)

and for example it doesn't work with astropy.units.Quantity because they have defined their priority as 10000:

import astropy.units as u
(np.arange(5)*u.dimensionless_unscaled) + Example(np.arange(5)) 
#returns:
<Quantity [array([ 0.,  1.,  2.,  3.,  4.]),
           array([ 1.,  2.,  3.,  4.,  5.]),
           array([ 2.,  3.,  4.,  5.,  6.]),
           array([ 3.,  4.,  5.,  6.,  7.]),
           array([ 4.,  5.,  6.,  7.,  8.])]>

And it doesn't work with any class that doesn't use the numpy-machinery.

Upvotes: 3

Related Questions