Summing grouped items in a numpy array without looping

Question

I'm trying to sum those non-zero values in array a that have the same value in array label and then replace them with 0 but only one of them with their sum:

import numpy as np
a =    np.array([[0,0,0,5,5,0],
                 [1,1,0,2,2,0],
                 [0,0,0,0,2,0],
                 [0,0,0,0,0,0],
                 [0,0,0,3,3,3]])

label = np.array([[0,1,2,3,3,3],
                  [1,1,4,4,4,3],
                  [1,4,4,5,4,6],
                  [1,4,4,4,7,8],
                  [9,5,5,5,5,5]])

#should produce the following result:
result =        [[0,0,0,0,0,10],
                 [2,0,0,0,6,0],
                 [0,0,0,0,0,0],
                 [0,0,0,0,0,0],
                 [0,0,0,0,9,0]]

it doesn't matter where we replace the sum. I couldn't think of any other way than looping.

a_ = a.ravel()
labels_ = labels.ravel()
list_of_labels = np.unique(label[a>0])

for item in list_of_labels:
     summ = np.sum(a_[np.argwhere((a_> 0) & (labels_ == item))])
     print summ

Paul Panzer · Accepted Answer

You can get the sums using np.bincount with the weights parameter. If I'm not mistaken np.bincount is O(n) as is the rest of the code below:

# get the sums
cnts = np.bincount(label.ravel(), a.ravel())
# next two lines get indices of the last occurrence of each label
psns = np.full(cnts.shape, -1, dtype=int)
psns[label.ravel()] = range(label.size)
# now plug the sums at the appropriate positions
resflat = np.zeros((a.size + 1,), dtype=a.dtype)
resflat[psns] = cnts
result = resflat[:-1].reshape(a.shape)
result
# array([[ 0,  0,  0,  0,  0,  0],
#        [ 0,  0,  0,  0,  0, 10],
#        [ 0,  0,  0,  0,  0,  0],
#        [ 2,  0,  0,  6,  0,  0],
#        [ 0,  0,  0,  0,  0,  9]])

Summing grouped items in a numpy array without looping

Answers (1)

Related Questions