Fast Python algorithm for random partitioning with subset sums equal or close to given ratios

Question

This question is an extension of my previous question: Fast python algorithm to find all possible partitions from a list of numbers that has subset sums equal to a ratio . I want to divide a list of numbers so that the ratios of subset sums equal to given values. The difference is now I have a long list of 200 numbers so that a enumeration is infeasible. Note that although there are of course same numbers in the list, every number is distinguishable.

import random
lst = [random.randrange(10) for _ in range(200)]

In this case, I want a function to stochastically sample a certain amount of partitions with subset sums equal or close to the given ratios. This means that the solution can be sub-optimal, but I need the algorithm to be fast enough. I guess a Greedy algorithm will do. With that being said, of course it would be even better if there is a relatively fast algorithm that can give the optimal solution.

For example, I want to sample 100 partitions, all with subset sum ratios of 4 : 3 : 3. Duplicate partitions are allowed but should be very unlikely for such long list. The function should be used like this:

partitions = func(numbers=lst, ratios=[4, 3, 3], num_gen=100)

To test the solution, you can do something like:

from math import isclose
eps = 0.05
assert all([isclose(ratios[i] / sum(ratios), sum(x) / sum(lst), abs_tol=eps) 
            for part in partitions for i, x in enumerate(part)])

Any suggestions?

EvilTak · Accepted Answer

You can use a greedy heuristic where you generate each partition from num_gen random permutations of the list. Each random permutation is partitioned into len(ratios) contiguous sublists. The fact that the partition subsets are sublists of a permutation make enforcing the ratio condition very easy to do during sublist generation: as soon as the sum of the sublist we are currently building reaches one of the ratios, we "complete" the sublist, add it to the partition and start creating a new sublist. We can do this in one pass through the entire permutation, giving us the following algorithm of time complexity O(num_gen * len(lst)).

M = 100

N = len(lst)
P = len(ratios)
R = sum(ratios)
S = sum(lst)

for _ in range(M):
    # get a new random permutation
    random.shuffle(lst)
    
    partition = []
    
    # starting index (in the permutation) of the current sublist
    lo = 0
    # permutation partial sum
    s = 0
    # index of sublist we are currently generating (i.e. what ratio we are on)
    j = 0
    # ratio partial sum
    rs = ratios[j]
    
    for i in range(N):
        s += lst[i]
        
        # if ratio of permutation partial sum exceeds ratio of ratio partial sum,
        # the current sublist is "complete"
        if s / S >= rs / R:
            partition.append(lst[lo:i + 1])
            # start creating new sublist from next element
            lo = i + 1
            j += 1
            if j == P:
                # done with partition
                # remaining elements will always all be zeroes 
                # (i.e. assert should never fail)
                assert all(x == 0 for x in lst[i+1:])
                partition[-1].extend(lst[i+1:])
                break
            rs += ratios[j]

Note that the outer loop can be redesigned to loop indefinitely until num_gen good partitions are generated (rather than just looping num_gen times) for more robustness. This algorithm is expected to produce M good partitions in O(M) iterations (provided random.shuffle is sufficiently random) if the number of good partitions is not too small compared to the total number of partitions of the same size, so it should perform well for for most inputs. For an (almost) uniformly random list like [random.randrange(10) for _ in range(200)], every iteration produces a good partition with eps = 0.05 as is evident by running the example below. Of course, how well the algorithm performs will also depend on the definition of 'good' -- the stricter the closeness requirement (in other words, the smaller the epsilon), the more iterations it will take to find a good partition. This implementation can be found here, and will work for any input (assuming random.shuffle eventually produces all permutations of the input list).

You can find a runnable version of the code (with asserts to test how "good" the partitions are) here.

Fast Python algorithm for random partitioning with subset sums equal or close to given ratios

Answers (1)

Related Questions