Reputation: 91

Efficiently sum a small numpy array, broadcast across a ginormous numpy array?

I want to calculate an indexed weight sum across a large (1,000,000 x 3,000) boolean numpy array. The large boolean array changes infrequently, but the weights come at query time, and I need answers very fast, without copying the whole large array, or expanding the small weight array to the size of the large array.

The result should be an array with 1,000,000 entries, each having the sum of the weights array entries corresponding to that row's True values.

I looked into using masked arrays, but they seem to require building a weights array the size of my large boolean array.

The code below gives the correct results, but I can't afford that copy during the multiply step. The multiply isn't even necessary, since the values array is boolean, but at least it handles the broadcasting properly.

I'm new to numpy, and loving it, but I'm about to give up on it for this particular problem. I've learned enough numpy to know to stay away from anything that loops in python.

My next step will be to write this routine in C (which has the added benefit of letting me save memory by using bits instead of bytes, by the way.)

Unless one of you numpy gurus can save me from cython?

from numpy import array, multiply, sum

# Construct an example values array, alternating True and False.
# This represents four records of three attributes each:
#    array([[False,  True, False],
#           [ True, False,  True],
#           [False,  True, False],
#           [ True, False,  True]], dtype=bool)
values = array([(x % 2) for x in range(12)], dtype=bool).reshape((4,3))

# Construct example weights, one for each attribute:
#    array([1, 2, 3])
weights = array(range(1, 4))

# Create expensive NEW array with the weights for the True attributes.
# Broadcast the weights array into the values array.
#    array([[0, 2, 0],
#           [1, 0, 3],
#           [0, 2, 0],
#           [1, 0, 3]])
weighted = multiply(values, weights)

# Add up the weights:
#    array([2, 4, 2, 4])
answers = sum(weighted, axis=1)

print answers

# Rejected masked_array solution is too expensive (and oddly inverts
# the results):
masked = numpy.ma.array([[1,2,3]] * 4, mask=values)

Upvotes: 8

Answers (4)

senderle

Reputation: 151067

It seems likely that dbaupp's answer is the correct one. But just for the sake of diversity, here's another solution that saves memory. This will work even for operations that don't have a built-in numpy equivalent.

>>> values = numpy.array([(x % 2) for x in range(12)], dtype=bool).reshape((4,3))
>>> weights = numpy.array(range(1, 4))
>>> weights_stretched = numpy.lib.stride_tricks.as_strided(weights, (4, 3), (0, 8))

numpy.lib.stride_tricks.as_strided is a wonderful little function! It allows you to specify shape and strides values that allow a small array to mimic a much larger array. Observe -- there aren't really four rows here; it just looks that way:

>>> weights_stretched[0][0] = 4
>>> weights_stretched 
array([[4, 2, 3],
       [4, 2, 3],
       [4, 2, 3],
       [4, 2, 3]])

So instead of passing a huge array to MaskedArray, you can pass a smaller one. (But as you've already noticed, numpy masking works in the opposite way you might expect; truth masks, rather than revealing, so you'll have to store your values inverted.) As you can see, MaskedArray doesn't copy any data; it just reflects whatever is in weights_stretched:

>>> masked = numpy.ma.MaskedArray(weights_stretched, numpy.logical_not(values))
>>> weights_stretched[0][0] = 1
>>> masked
masked_array(data =
 [[-- 2 --]
 [1 -- 3]
 [-- 2 --]
 [1 -- 3]],
      mask =
 [[ True False  True]
 [False  True False]
 [ True False  True]
 [False  True False]],
      fill_value=999999)

Now we can just pass it to sum:

>>> sum(masked, axis=1)
masked_array(data = [2 4 2 4],
      mask = [False False False False],
      fill_value=999999)

I benchmarked numpy.dot and the above against a 1,000,000 x 30 array. This is the result on a relatively modern MacBook Pro (numpy.dot is dot1; mine is dot2):

>>> %timeit dot1(values, weights)
1 loops, best of 3: 194 ms per loop
>>> %timeit dot2(values, weights)
1 loops, best of 3: 459 ms per loop

As you can see, the built-in numpy solution is faster. But stride_tricks is worth knowing about regardless, so I'm leaving this.

Upvotes: 3

Akavall

Reputation: 86266

I don't think you need numpy for something like that. And 1000000 by 3000 is a huge array; this will not fit in your RAM, most likely.

I would do it this way:

Let's say that you data is originally in a text file:

False,True,False
True,False,True
False,True,False
True,False,True

My code:

weight = range(1,4)    
dicto = {'True':1, 'False':0}

with open ('my_data.txt') as fin:

    a = sum(sum(dicto[ele]*w for ele,w in zip(line.strip().split(','),weight)) for line in fin)

Result:

>>> a
12

EDIT:

I think I slightly misread the question first time around, and summed up the everything together. Here is the solution that gives the exact solution that OP is after:

weight = range(1,4)
dicto = {'True':1, 'False':0}

with open ('my_data.txt') as fin:

    a = [sum(dicto[ele]*w for ele,w in zip(line.strip().split(','),weight)) for line in fin]

Result:

>>> a
[2, 4, 2, 4]

Upvotes: 0

huon

Reputation: 102206

The dot product (or inner product) is what you want. It allows you to take a matrix of size m×n and a vector of length n and multiply them together yielding a vector of length m, where each entry is the weighted sum of a row of the matrix with the entries of the vector of as weights.

Numpy implements this as array1.dot(array2) (or numpy.dot(array1, array2) in older versions). e.g.:

from numpy import array

values = array([(x % 2) for x in range(12)], dtype=bool).reshape((4,3))

weights = array(range(1, 4))

answers = values.dot(weights)
print answers
# output: [ 2 4 2 4 ]

(You should benchmark this though, using the timeit module.)

Upvotes: 4

steveha

Reputation: 76725

Would this work for you?

a = np.array([sum(row * weights) for row in values])

This uses sum() to immediately sum the row * weights values, so you don't need the memory to store all the intermediate values. Then the list comprehension collects all the values.

You said you want to avoid anything that "loops in Python". This at least does the looping with the C guts of Python, rather than an explicit Python loop, but it can't be as fast as a NumPy solution because that uses compiled C or Fortran.

Upvotes: 1

Efficiently sum a small numpy array, broadcast across a ginormous numpy array?

Answers (4)

Related Questions