Sun Bear
Sun Bear

Reputation: 8234

How to get the sum of the value of certain elements of a NumPy array?

I have three NumPy arrays. Two arrays (say a and b) contain the left and right bound of the column number of the third array (c) that is to be processed, i.e. have the values of its elements within the bounds summed. How do I do this in NumPy? Presently, I have done it with Python like so:

>>> a = np.array([ 3,  7, 11])
             
>>> b= np.array([25, 21, 17])
             
>>> c = np.arange(90).reshape(3,30)
             
>>> c
             
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
        46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
        76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> row=0
                
>>> for l, r in zip(a,b):
    print( sum( c[ row, l:r ] ) )
    row+=1

                
297
609
441
>>> sum([3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24,])
                
297
>>> sum([37, 38, 39, 40, 41, 42, 43, 44, 45,
        46, 47, 48, 49, 50])
                
609
>>> sum([71, 72, 73, 74, 75,
        76])
                
441
>>> 

Upvotes: 0

Views: 714

Answers (2)

Ehsan
Ehsan

Reputation: 12397

Another loop version of it:

[c[n,np.s_[i:j]].sum() for n,(i,j) in enumerate(zip(a,b))]

output:

[297, 609, 441]

I would recommend for small sized c use other answer and for large c use this answer.

Upvotes: 0

Marat
Marat

Reputation: 15738

idx = np.repeat(np.arange(c.shape[1])[None, :], c.shape[0], axis=0)
(c * ((idx >= a[:, None]) & (idx < b[:, None]))).sum(axis=1)
# output: array([297, 609, 441])

What is going on here:

  1. Create a tile of ranges: [[0, 1, ...n], [0, 1, ..., n], ...]
  2. row-wise, set mask to elements that are less than b and greater than a. a[:, None] is just adding a dimension, numpy then broadcasts this array to match the ranges shape.
  3. multiply c by this mask. Booleans interpreted as integers are either 0 or 1, so everything except elements to sum up turns zero
  4. Sum along rows

PS. It takes longer to explain than to write the code. Nothe that since ranges established by pairs from a,b have arbitrary length we can't store them in native numpy data structures. This is why we have to play with masks here

Upvotes: 1

Related Questions