Efficient iteration over sorted partial sums

Question

I have a list of N positive numbers sorted in ascending order, L[0] to L[N-1].

I want to iterate over subsets of M distinct list elements (without replacement, order not important), 1 <= M <= N, sorted according to their partial sum. M is not fixed, the final result should consider all possible subsets.

I only want the K smallest subsets efficiently (ideally polynomial in K). The obvious algorithm of enumerating all subsets with M <= K is O(K!).

I can reduce the problem to subsets of fixed size M, by placing K iterators (1 <= M <= K) in a min-heap and having the master iterator operate on the heap root.

Essentially I need the Python function call:

sorted(itertools.combinations(L, M), key=sum)[:K]

... but efficient (N ~ 200, K ~ 30), should run in less than 1sec.

Example:

L = [1, 2, 5, 10, 11]
K = 8
answer = [(1,), (2,), (1,2), (5,), (1,5), (2,5), (1,2,5), (10,)]

Answer:

As David's answer shows, the important trick is that for a subset S to be outputted, all subsets of S must have been previously outputted, in particular the subsets where only 1 element has been removed. Thus, every time you output a subset, you can add all 1-element extensions of this subset for consideration (a maximum of K), and still be sure that the next outputted subset will be in the list of all considered subsets up to this point.

Fully working, more efficient Python function:

def sorted_subsets(L, K):
  candidates = [(L[i], (i,)) for i in xrange(min(len(L), K))]

  for j in xrange(K):
    new = candidates.pop(0)
    yield tuple(L[i] for i in new[1])
    new_candidates = [(L[i] + new[0], (i,) + new[1]) for i in xrange(new[1][0])]
    candidates = sorted(candidates + new_candidates)[:K-j-1]

UPDATE, found an O(K log K) algorithm.

This is similar to the trick above, but instead of adding all 1-element extensions with the elements added greater than the max of the subset, you consider only 2 extensions: one that adds max(S)+1, and the other one that shifts max(S) to max(S) + 1 (that would eventually generate all 1-element extensions to the right).

import heapq

def sorted_subsets_faster(L, K):
  candidates = [(L[0], (0,))]

  for j in xrange(K):
    new = heapq.heappop(candidates)
    yield tuple(L[i] for i in new[1])
    i = new[1][-1]
    if i+1 < len(L):
      heapq.heappush(candidates, (new[0] + L[i+1], new[1] + (i+1,)))
      heapq.heappush(candidates, (new[0] - L[i] + L[i+1], new[1][:-1] + (i+1,)))

From my benchmarks, it is faster for ALL values of K.

Also, it is not necessary to supply in advance the value of K, we can just iterate and stop whenever, without changing the efficiency of the algorithm. Also note that the number of candidates is bounded by K+1.

It might be possible to improve even further by using a priority deque (min-max heap) instead of a priority queue, but frankly I'm satisfied with this solution. I'd be interested in a linear algorithm though, or a proof that it's impossible.

David · Accepted Answer

Here's some rough Python-ish pseudo-code:

final = []
L = L[:K]    # Anything after the first K is too big already
sorted_candidates = L[] 
while len( final ) < K:
    final.append( sorted_candidates[0] )  # We keep it sorted so the first option
                                          # is always the smallest sum not
                                          # already included
    # If you just added a subset of size A, make a bunch of subsets of size A+1
    expansion = [sorted_candidates[0].add( x ) 
                   for x in L and x not already included in sorted_candidates[0]]

    # We're done with the first element, so remove it
    sorted_candidates = sorted_candidates[1:]

    # Now go through and build a new set of sorted candidates by getting the
    # smallest possible ones from sorted_candidates and expansion
    new_candidates = []
    for i in range(K - len( final )):
        if sum( expansion[0] ) < sum( sorted_candidates[0] ):
            new_candidates.append( expansion[0] )
            expansion = expansion[1:]
        else:
            new_candidates.append( sorted_candidates[0] )
            sorted_candidates = sorted_candidates[1:]
    sorted_candidates = new_candidates

We'll assume that you will do things like removing the first element of an array in an efficient way, so the only real work in the loop is in building expansion and in rebuilding sorted_candidates. Both of these have fewer than K steps, so as an upper bound, you're looking at a loop that is O(K) and that is run K times, so O(K^2) for the algorithm.

Efficient iteration over sorted partial sums

Answers (1)

Related Questions