MManke
MManke

Reputation: 133

How can I reduce the time complexity of the solution my subset optimization problem?

I have a list of tuples (a, b) of two unrelated positive integers. I am trying to create another list based off of this that maximizes the sum of a, while staying above a threshold.

I also have two constant integers weight and sharedWeight.

To check if the list is above the threshold I will have to calculate (sumOfAllB × (weight − sharedWeight) + sharedWeight) / list.length which has to equal or exceed my threshold. If I understand it correctly this is a variation of the knapsack problem with the added complexity of not having a fixed threshold, but one that depends on the number of items in the solution.

To solve this problem optimally the following solution will work but has a time complexity of O(2^n). I am trying to find a much simpler way to do this.

const exampleQueue = [
  {a: 10, b: 4},
  {a: 7, b: 10},
  {a: 3, b: 6},
  {a: 1, b: 4},
  {a: 1, b: 1},
]
const mappedToA = [10, 7, 3, 1, 1] // does not meet the threshold
const firstAttempt = [10, 7, 3, 1, 0]
const secondAttempt = [10, 7, 3, 0, 1]
const thirdAttempt = [10, 7, 3, 0, 0]
const fourthAttempt = [10, 7, 0, 1, 1]
// ...

One way to improve this would be to first map to B, find the minimum amount of elements that need to be removed, and then try starting with that. However after this I am not sure how to further optimize this solution or what to look for to find a better way

Upvotes: 3

Views: 133

Answers (1)

btilly
btilly

Reputation: 46497

I wish you told us the additional constraint. If the additional constraint is minor, that can be used in a "build up" version that should be much more efficient. But I can make it work.

The key ideas that I will use are these:

  1. Dynamic programming. Building up a data structure incrementally to avoid repeated recalculation. This turns many problems, including things like subset sum, from exponential to pseudo-polynomial. Which, in practice, is often doable.
  2. An optimal fringe. We do not need to keep all possibilities. Just the ones that might lead to some optimal tradeoff.
  3. Linked lists. Few realize it, but dynamic programming can efficiently build up a linked list data structure allowing the optimal solution to be extracted. The key is that as you trace back the choices, two different linked lists can merge. And so keeping history on all currently optimal possibilities is much cheaper than it looks.

Let's do our class to keep track of current solution states.

class SolutionState:
    def __init__ (self, tuples=None, prev=None, delete_at=None, threshold=None):
        if prev is None:
            # Base object.
            # Sort values by b increasing, then a decreasing.
            self.tuples = sorted(tuples, key=lambda t: (t[1], -t[0]))
            self.a = sum(t[0] for t in tuples)
            self.b = sum(t[1] for t in tuples)
            self.prev = None
            self.delete_at = delete_at
            self.deleted = 0
            self.threshold = threshold
        else:
            # Links back to previous solutions.
            self.tuples = prev.tuples # Cheap shallow copy!
            t = self.tuples[delete_at]
            self.a = prev.a - t[0]
            self.b = prev.b - t[1]
            self.prev = prev
            self.delete_at = delete_at
            self.deleted = prev.deleted + 1
            if threshold is None:
                self.threshold = prev.threshold
            else:
                self.threshold = threshold

    def solution (self):
        current = self
        reversed_answer = []
        i = len(self.tuples)
        while 0 < i:
            i -= 1
            if current.delete_at is None:
                reversed_answer.append(self.tuples[i])
            elif current.delete_at == i:
                current = current.prev
            else:
                reversed_answer.append(self.tuples[i])
        return list(reversed(reversed_answer))


    def score (self):
        n = len(self.tuples) - self.deleted
        if n == 0:
            # Guarantee valid.
            return self.threshold + 1
        else:
            return self.b / n

    def is_valid (self):
        return self.threshold < self.score()

As you see, we simply record a solution state as a linked list of previous solution states, each time deleting one, until we get to a base state which we keep. We always know the sum of a, the sum of b, and how many deletions we have.

We can also figure out the exact solution represented. But note, The linked list will turn out to be the last deletion to the first, so we build the answer from the end to the beginning, then reverse it. That's a potentially confusing technical detail.

I've also incorporated a score and an is_valid based on the rule you gave us.

Now let's talk about our optimal fringe. Suppose that we're at the ith tuple. Suppose that we have a potential SolutionState. Suppose at that point that there is another another SolutionState with the same number of deletions, and with both a and b at least as good. Then this partial solution cannot produce a better final valid solution. Because whatever is kept or deleted from this one, do the same to that. And if this becomes valid, that does as well. With an a that is the same as or better than this one.

So at each i, for each number of deletions d, we have an optimal fringe. We can store it by strictly decreasing a, and strictly increasing b. We can therefore store the current set of possibilities in a data structure that looks like this:

by number of deletions d
   list of SolutionStates, sorted by a descending, and b ascending

Now how do we handle moving from i to i+1? Well for each d, we have an optimal fringe from before with d deletions, and another with d-1 deletions. We create a second list of things that might be in an optimal fringe by deleting the current element from the second list. Then we merge the two lists and keep our new optimal fringe.

And now here is a good trick. If we go by descending d, then we can update this data structure in place. Because we never again need the dth fringe after we've updated it.

Here is what that code looks like.

def find_optimal (threshold, tuples):
    base = SolutionState(tuples=tuples)
    base.threshold = threshold
    if base.is_valid():
        # Don't think too hard, deleting elements reduces a, so this is best.
        return base.solution()
    else:
        optimal = [[base]]
        best = None
        for i in range(len(tuples)):
            d = len(optimal)
            optimal.append([]) # Add blank optimal fringe.
            while 0 < d:
                fringe_keep = optimal[d]
                fringe_delete = [SolutionState(prev=ss, delete_at=i)
                        for ss in optimal[d-1]]

                # Did we find possible final solutions?
                while 0 < len(fringe_delete) and fringe_delete[-1].is_valid():
                    # Any further deletion reduces a
                    ss = fringe_delete.pop()
                    # Is this an improvement?
                    if best is None or best.a < ss.a:
                        best = ss

                # Merge what's left
                fringe_merge = []
                j = 0
                k = 0
                while j < len(fringe_keep) and k < len(fringe_delete):
                    # Would fringe_delete[k] come after?
                    if fringe_keep[j].b <= fringe_delete[k].b:
                        # Is fringe_keep[j] in the fringe?
                        if fringe_delete[k].a < fringe_keep[j].a:
                            fringe_merge.append(fringe_keep[j])
                        j += 1
                    else:
                        # Check if fringe_delete[k] is kept.
                        if fringe_keep[j].a < fringe_delete[k].a:
                            fringe_merge.append(fringe_delete[k])
                        k += 1

                # Keep remainder of whatever is left.
                while j < len(fringe_keep):
                    fringe_merge.append(fringe_keep[j])
                    j += 1
                while k < len(fringe_delete):
                    fringe_merge.append(fringe_delete[k])
                    k += 1

                # Replace and move down.
                optimal[d] = fringe_merge
                d -= 1

            # We might have empty fringes at top? Small optimization.
            while 0 < len(optimal) and 0 == len(optimal[-1]):
                optimal.pop()

        # If nothing else, delete everything was valid.
        return best.solution()

And, finally, here is some test code based on your example.

tuples = [(10, 4), (7, 10), (3, 6), (1, 4), (1, 1)]
for threshold in range(11):
    print(i, find_optimal(threshold, tuples))

If you have n tuples, and the sum of a is A, then this will find an answer using at most space and time O(n*n*A). It will usually use considerably less space than that. And the constants on the time will tend to be fairly good.

Upvotes: 3

Related Questions