Jacksparrow
Jacksparrow

Reputation: 115

Updating python numpy array columns multiple times

Suppose I have a 2x3 matrix A:

1 2 3   
4 5 6  

and a vector y of length 4:

0 1 2 1

as well as another 4x2 matrix B:

0 0  
1 1   
2 2   
3 3 

I want to update the columns of A multiple times by adding from rows of B. And the index of columns of A to be updated is given by y.

Using for loops, this can be done as:

for i in np.arange(4):
  A[:,y[i]] += B[i,:]

I had implemented this using ufunc.at as:

np.add.at(A.T,y,B)  

However, the performance of ufunc.at is almost as bad as using for loops.

How can I get a different vectorized implementation?

Updating using A[:,y]+=B.T seems to update each column only once.

Upvotes: 1

Views: 286

Answers (1)

Divakar
Divakar

Reputation: 221614

Approach #1

Here's one approach using summations at intervals with np.add.reduceat -

def reduceat_app(A, y, B):
    idx = y.argsort()
    y0 = y[idx]
    sep_idx = np.concatenate(([0], np.flatnonzero(y0[1:] != y0[:-1])+1))
    A += np.add.reduceat(B[idx],sep_idx).T

Approach #2

With relatively lesser number of rows in A, we can also use np.bincount to perform those bin-based summations in an iterative manner for each row, like so -

def bincount_loopy_app(A, y, B): 
    n = A.shape[1]
    for i,a in enumerate(A):
        a += np.bincount(y,B[:,i],minlength=n).astype(A.dtype)

Approach #3

We can vectorize previous approach by creating a 2D grid of y indices/bins for all elements such that for each row, we would have offsetted bins. With that offsetting, bincount could be used to perform bin based summations across all rows in a vectorized manner.

Thus, the implementation would be -

def bincount_vectorized_app(A, y, B): 
    m,n = A.shape
    IDs = y[:,None] + n*np.arange(B.shape[1])
    vals = np.bincount(IDs.ravel(), B.ravel(), minlength=m*n)
    A += vals.astype(A.dtype).reshape((m,n))

Upvotes: 1

Related Questions