shx2
shx2

Reputation: 64318

How do numpy's in-place operations (e.g. `+=`) work?

The basic question is: What happens under the hood when doing: a[i] += b?

Given the following:

import numpy as np
a = np.arange(4)
i = a > 0
i
= array([False,  True,  True,  True], dtype=bool)

I understand that:

But what happens when I do:

a[i] += x

Specifically:

  1. Is this the same as a[i] = a[i] + x? (which is not an in-place operation)
  2. Does it make a difference in this case if i is:
    • an int index, or
    • an ndarray, or
    • a slice object

Background

The reason I started delving into this is that I encountered a non-intuitive behavior when working with duplicate indices:

a = np.zeros(4)
x = np.arange(4)
indices = np.zeros(4,dtype=np.int)  # duplicate indices
a[indices] += x
a
= array([ 3.,  0.,  0.,  0.])

More interesting stuff about duplicate indices in this question.

Upvotes: 25

Views: 19337

Answers (4)

Mr Fooz
Mr Fooz

Reputation: 111866

As Ivc explains, there is no in-place item add method, so under the hood it uses __getitem__, then __iadd__, then __setitem__. Here's a way to empirically observe that behavior:

import numpy

class A(numpy.ndarray):
    def __getitem__(self, *args, **kwargs):
        print("getitem")
        return numpy.ndarray.__getitem__(self, *args, **kwargs)
    def __setitem__(self, *args, **kwargs):
        print("setitem")
        return numpy.ndarray.__setitem__(self, *args, **kwargs)
    def __iadd__(self, *args, **kwargs):
        print("iadd")
        return numpy.ndarray.__iadd__(self, *args, **kwargs)

a = A([1,2,3])
print("about to increment a[0]")
a[0] += 1

It prints

about to increment a[0]
getitem
iadd
setitem

Upvotes: 3

Mark Mikofski
Mark Mikofski

Reputation: 20198

I don't know what's going on under the hood, but in-place operations on items in NumPy arrays and in Python lists will return the same reference, which IMO can lead to confusing results when passed into a function.

Start with Python

>>> a = [1, 2, 3]
>>> b = a
>>> a is b
True
>>> id(a[2])
12345
>>> id(b[2])
12345

... where 12345 is a unique id for the location of the value at a[2] in memory, which is the same as b[2].

So a and b refer to the same list in memory. Now try in-place addition on an item in the list.

>>> a[2] += 4
>>> a
[1, 2, 7]
>>> b
[1, 2, 7]
>>> a is b
True
>>> id(a[2])
67890
>>> id(b[2])
67890

So in-place addition of the item in the list only changed the value of the item at index 2, but a and b still reference the same list, although the 3rd item in the list was reassigned to a new value, 7. The reassignment explains why if a = 4 and b = a were integers (or floats) instead of lists, then a += 1 would cause a to be reassigned, and then b and a would be different references. However, if list addition is called, eg: a += [5] for a and b referencing the same list, it does not reassign a; they will both be appended.

Now for NumPy

>>> import numpy as np
>>> a = np.array([1, 2, 3], float)
>>> b = a
>>> a is b
True

Again these are the same reference, and in-place operators seem have the same effect as for list in Python:

>>> a += 4
>>> a
array([ 5.,  6.,  7.])
>>> b
array([ 5.,  6.,  7.])

In place addition of an ndarray updates the reference. This is not the same as calling numpy.add which creates a copy in a new reference.

>>> a = a + 4
>>> a
array([  9.,  10.,  11.])
>>> b
array([ 5.,  6.,  7.])

In-place operations on borrowed references

I think the danger here is if the reference is passed to a different scope.

>>> def f(x):
...     x += 4
...     return x

The argument reference to x is passed into the scope of f which does not make a copy and in fact changes the value at that reference and passes it back.

>>> f(a)
array([ 13.,  14.,  15.])
>>> f(a)
array([ 17.,  18.,  19.])
>>> f(a)
array([ 21.,  22.,  23.])
>>> f(a)
array([ 25.,  26.,  27.])

The same would be true for a Python list as well:

>>> def f(x, y):
...     x += [y]

>>> a = [1, 2, 3]
>>> b = a
>>> f(a, 5)
>>> a
[1, 2, 3, 5]
>>> b
[1, 2, 3, 5]

IMO this can be confusing and sometimes difficult to debug, so I try to only use in-place operators on references that belong to the current scope, and I try be careful of borrowed references.

Upvotes: 6

seberg
seberg

Reputation: 8975

Actually that has nothing to do with numpy. There is no "set/getitem in-place" in python, these things are equivalent to a[indices] = a[indices] + x. Knowing that, it becomes pretty obvious what is going on. (EDIT: As lvc writes, actually the right hand side is in place, so that it is a[indices] = (a[indices] += x) if that was legal syntax, that has largly the same effect though)

Of course a += x actually is in-place, by mapping a to the np.add out argument.

It has been discussed before and numpy cannot do anything about it as such. Though there is an idea to have a np.add.at(array, index_expression, x) to at least allow such operations.

Upvotes: 7

lvc
lvc

Reputation: 35079

The first thing you need to realise is that a += x doesn't map exactly to a.__iadd__(x), instead it maps to a = a.__iadd__(x). Notice that the documentation specifically says that in-place operators return their result, and this doesn't have to be self (although in practice, it usually is). This means a[i] += x trivially maps to:

a.__setitem__(i, a.__getitem__(i).__iadd__(x))

So, the addition technically happens in-place, but only on a temporary object. There is still potentially one less temporary object created than if it called __add__, though.

Upvotes: 22

Related Questions