Antoine Belgodere
Antoine Belgodere

Reputation: 53

How to vectorize increments in Python

I have a 2d array, and I have some numbers to add to some cells. I want to vectorize the operation in order to save time. The problem is when I need to add several numbers to the same cell. In this case, the vectorized code only adds the last. 'a' is my array, 'x' and 'y' are the coordinates of the cells I want to increment, and 'z' contains the numbers I want to add.

import numpy as np

a=np.zeros((4,4))
x=[1,2,1]
y=[0,1,0]
z=[2,3,1]
a[x,y]+=z
print(a)

As you see, a[1,0] should be incremented twice: one by 2, one by 1. So the expected array should be:

[[0. 0. 0. 0.]
 [3. 0. 0. 0.]
 [0. 3. 0. 0.]
 [0. 0. 0. 0.]]

but instead I get:

[[0. 0. 0. 0.]
 [1. 0. 0. 0.]
 [0. 3. 0. 0.]
 [0. 0. 0. 0.]]

The problem would be easy to solve with a for loop, but I wonder if I can correctly vectorize this operation.

Upvotes: 5

Views: 481

Answers (4)

Divakar
Divakar

Reputation: 221584

Approach #1: Bincount-based method for performance

We can use np.bincount for efficient bin-based summation and basically inspired by this post -

def accumulate_arr(x, y, z, out):
    # Get output array shape
    shp = out.shape

    # Get linear indices to be used as IDs with bincount
    lidx = np.ravel_multi_index((x,y),shp)
    # Or lidx = coords[0]*(coords[1].max()+1) + coords[1]

    # Accumulate arr with IDs from lidx
    out += np.bincount(lidx,z,minlength=out.size).reshape(out.shape)
    return out

If you are working with a zeros-initialized output array, feed in the output shape directly into the function and get the bincount output as the final one.

Output on given sample -

In [48]: accumulate_arr(x,y,z,a)
Out[48]: 
array([[0., 0., 0., 0.],
       [3., 0., 0., 0.],
       [0., 3., 0., 0.],
       [0., 0., 0., 0.]])

Approach #2: Using sparse-matrix for memory-efficiency

In [54]: from scipy.sparse import coo_matrix

In [56]: coo_matrix((z,(x,y)), shape=(4,4)).toarray()
Out[56]: 
array([[0, 0, 0, 0],
       [3, 0, 0, 0],
       [0, 3, 0, 0],
       [0, 0, 0, 0]])

If you are okay with a sparse-matrix, skip the .toarray() part for a memory-efficient solution.

Upvotes: 0

javidcf
javidcf

Reputation: 59711

Use np.add.at for that:

import numpy as np

a = np.zeros((4,4))
x = [1, 2, 1]
y = [0, 1, 0]
z = [2, 3, 1]
np.add.at(a, (x, y), z)
print(a)
# [[0. 0. 0. 0.]
#  [3. 0. 0. 0.]
#  [0. 3. 0. 0.]
#  [0. 0. 0. 0.]]

Upvotes: 4

user7440787
user7440787

Reputation: 841

You could create a multi-dimensional array of size 3x4x4, then add up z to all the 3 different dimensions and them sum them all

import numpy as np
x = [1,2,1]
y = [0,1,0]
z = [2,3,1]
a = np.zeros((3,4,4))
n = range(a.shape[0])
a[n,x,y] += z
print(sum(a))

which will result in

[[0. 0. 0. 0.]
 [3. 0. 0. 0.]
 [0. 3. 0. 0.]
 [0. 0. 0. 0.]]

Upvotes: 0

Alex_6
Alex_6

Reputation: 319

When you're doing a[x,y]+=z, we can decompose the operations as :

a[1, 0], a[2, 1], a[1, 0] = [a[1, 0] + 2, a[2, 1] + 3, a[1, 0] + 1]
# Equivalent to :
a[1, 0] = 2
a[2, 1] = 3
a[1, 0] = 1

That's why it doesn't works. But if you're incrementing your array with a loop for each dimention, it should work

Upvotes: 0

Related Questions