Vladimir Vargas
Vladimir Vargas

Reputation: 1834

Efficient cummulative sum in Python

I have a vector a of known size N, so that np.sum(a) is 1, and np.all(a>=0) is true. I want to determine the minimum number of entries that sum a threshold t. For instance, I would do something like:

idx = np.argsort(a)
asorted = a[idx][::-1]
sum_ = 0
number = 0
while sum_ < t:
    number += 1
    sum_ = np.sum(asorted[:number])

as soon as sum_ is greater than t, the program stops, and the variable number tells me the minimum number of entries that sum that threshold.

I am looking for the most efficient way of getting this number, as I have to perform this operation millions of times.

Upvotes: 2

Views: 484

Answers (4)

norok2
norok2

Reputation: 26956

(EDITED)

(EDIT2: added a more specialized JIT-ed version to address issues when using np.sort() with numba.)

(EDIT3: included timing for the recursive approach with median pivoting from @hilberts_drinking_problem's answer)

I am not 100% what you are after, because the first two lines of your code seems to be doing nothing, but after @hilberts_drinking_problem I have edited my answer, I assume that you have a typo and:

sum_ = np.sum(arr[:i])

should be:

sum_ = np.sum(asorted[:i])

Then, your solution can be written as a function like:

import numpy as np


def min_sum_threshold_orig(arr, threshold=0.5):
    idx = np.argsort(arr)
    arr_sorted = arr[idx][::-1]
    sum_ = 0
    i = 0
    while sum_ < threshold:
        i += 1
        sum_ = np.sum(arr_sorted[:i])
    return i

However:

  1. Instead of np.argsort() and indexing you could use np.sort() directly
  2. there is no need to compute the entire sum at each iteration, but you could instead use the sum from the previous iteration
  3. Using a while loop is risky because if threshold is sufficiently high (> 1.0 with your assumption) then the loop will never end

Addressing these points one gets to:

def min_sum_threshold(arr, threshold=0.5):
    arr = np.sort(arr)[::-1]
    sum_ = 0
    for i in range(arr.size):
        sum_ += arr[i]
        if sum_ >= threshold:
            break
    return i + 1

In the above, the explicit looping becomes the bottleneck. A good way of addressing it, is to use numba:

import numba as nb


min_sum_threshold_nbn = nb.jit(min_sum_threshold)
min_sum_threshold_nbn.__name__ = 'min_sum_threshold_nbn'

But this may not be the most efficient approach as numba is relatively slow when new arrays are being created. A possibly faster approach is to use arr.sort() instead of np.sort() because that is in-place, thus avoiding the creation of a new array:

@nb.jit
def min_sum_thres_nb_inplace(arr, threshold=0.5):
    arr.sort()
    sum_ = 0
    for i in range(arr.size - 1, -1, -1):
        sum_ += arr[i]
        if sum_ >= threshold:
            break
    return arr.size - i

Alternatively, one could JIT only the portion of code after the sorting:

@nb.jit
def _min_sum_thres_nb(arr, threshold=0.5):
    sum_ = 0.0
    for i in range(arr.size):
        sum_ += arr[i]
        if sum_ >= threshold:
            break
    return i + 1


def min_sum_thres_nb(arr, threshold=0.5):
    return _min_sum_thres_nb(np.sort(arr)[::-1], threshold)

The difference between the two will be minimal for larger inputs. For smaller one, min_sum_thres_nb() will be dominated by the comparatively slow extra function call. Because of the pitfails in benchmarking functions which modify their input, min_sum_thres_nb_inplace() is omitted from benchmarks, with the understanding that for very small inputs is as fast as min_sum_thres_nbn() and for larger ones it has essentially the same performances as min_sum_thres_nb().


Alternatively one could use a vectorized approaches like in @yatu's answer:

def min_sum_threshold_np_sum(arr, threshold=0.5):
    return np.sum(np.cumsum(np.sort(arr)[::-1]) < threshold) + 1

or, better, use np.searchsorted() which avoids creating an unnecessary temporary array with the comparison:

def min_sum_threshold_np_ss(arr, threshold=0.5):
    return np.searchsorted(np.cumsum(np.sort(arr)[::-1]), threshold) + 1

or, assuming that sorting the entire array is unnecessarily expensive:

def min_sum_threshold_np_part(arr, threshold=0.5):
    n = arr.size
    m = np.int(size * threshold) + 1
    part_arr = np.partition(arr, n - m)[n - m:]
    return np.searchsorted(np.cumsum(np.sort(arr)[::-1]), threshold) + 1

An even more sophisticated approach using recursion and median pivoting is:

def min_sum_thres_rec(arr, threshold=0.5, cutoff=64):
    n = arr.size
    if n <= cutoff:
        return np.searchsorted(np.cumsum(np.sort(arr)[::-1]), threshold) + 1
    else:
        m = n // 2
        partitioned = np.partition(arr, m)
        low = partitioned[:m]
        high = partitioned[m:]
        sum_high = np.sum(high)
        if sum_high >= threshold:
            return min_sum_thres_rec(high, threshold)
        else:
            return min_sum_thres_rec(low, threshold - sum_high) + high.size

(the last three are adapted from @hilberts_drinking_problem's answer)


Benchmarking these with the input generated from this:

def gen_input(n, a=0, b=10000):
    arr = np.random.randint(a, b, n)
    arr = arr / np.sum(arr)
    return arr

gives the following:

bm_full bm_zoom

These indicate that for small enough inputs, the numba approach is the fastest, but as soon as the input exceeds ~600 elements for the naive approach or ~900 for the optimized one, the NumPy approaches which uses np.partition(), while being less memory efficient, are faster.

Eventually, beyond ~4000 elements, the min_sum_thres_rec() gets faster than all other proposed methods. It may be possible to write a faster numba-based implementation of this method.

Note that the optimized numba approach is strictly faster than the naive NumPy approaches that were tested.

Upvotes: 4

hilberts_drinking_problem
hilberts_drinking_problem

Reputation: 11602

It just occurred to me that there is a recursive linear time algorithm for this based on median pivoting:

def min_sum_rec(arr, t=0.5):
  n = arr.size
  # default to sorting for small arrays
  if n <= 100:
    return np.searchsorted(np.sort(arr)[::-1].cumsum(), t) + 1
  partitioned = np.partition(arr, n//2)
  low = partitioned[:n//2]
  high = partitioned[n//2:]
  sum_high = high.sum()
  if sum_high >= t:
    return min_sum_rec(high, t)
  else:
    return min_sum_rec(low, t - sum_high) + high.size

Here is comparison to my previous O(n log(n)) solution, in seconds:

N           min_sum_rec  num_to_t
10            0.000041  0.000038
100           0.000025  0.000028
1000          0.000086  0.000042
10000         0.000321  0.000310
100000        0.002547  0.003259
1000000       0.028826  0.039854
10000000      0.247731  0.431744
100000000     2.371766  4.800107

Previous solution, which can be faster for smaller-sized arrays:

In addition to using cumsum, note that the average element of the array has size 1/N. Hence, it takes at most t*N elements to add up to t. For small t, this provides an optimization opportunity where we make an O(N) call to np.partition, followed by sorting of t*N largest elements:

import numpy as np
np.random.seed(0)

a = np.random.rand(10**6)
a /= a.sum()
t = 1e-3


def num_to_t_sort(a, t):
  c = np.sort(a)[::-1].cumsum()
  return np.searchsorted(c, t) + 1


def num_to_t(a, t):
  n = len(a)
  m = np.int(n * t) + 1
  b = np.partition(a, n-m)[n-m:]
  b[::-1].sort()
  c = b.cumsum()
  return np.searchsorted(c, t) + 1


assert num_to_t(a, t) == num_to_t_sort(a, t)
%timeit num_to_t(a, t)      # 100 loops, best of 3: 11.8 ms per loop
%timeit num_to_t_sort(a, t) # 10 loops, best of 3: 107 ms per loop

A similar optimization applies if t tends to be close to 1. If you are repeating operations for the same array and different t, you are probably better of storing c = np.sort(a)[::-1].cumsum() and calling np.searchsorted for each t.

Also, I am assuming that every element is strictly positive. Otherwise, one needs to consider two separate cases depending on whether t occurs in c.

Upvotes: 3

doggie_breath
doggie_breath

Reputation: 802

Depending on the length of the arrays, maybe sort, and then you could use a cumulative sum? It would at least be faster than that while loop.

>>> a = np.array(range(10)[::-1])
>>> a
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
>>> b = np.cumsum(a)
>>> b
array([ 9, 17, 24, 30, 35, 39, 42, 44, 45, 45])

Then just use argmax, say you wanted the index where it passed 40:

>>> np.argmax(b > 40)
6

Upvotes: 1

yatu
yatu

Reputation: 88305

Here's a NumPy based approach:

(np.cumsum(np.sort(a)[::-1]) < t).sum() + 1

For instance:

a = np.array([1,2,8,5,13,9])

(np.cumsum(np.sort(a)[::-1]) < 25).sum() + 1
# 3

Where:

np.sort(a)[::-1]
# array([13,  9,  8,  5,  2,  1])

And:

np.cumsum(np.sort(a)[::-1])
# array([13, 22, 30, 35, 37, 38])

Upvotes: 1

Related Questions