Dynamic Programming Implementation of Unique subset generation

Question

I came across this problem where we need to find the total number of ways in which you can form a subset of n(where n is the input by the user). The subset conditions are : the numbers in the subset should be distinct and the numbers in the subset should be in decreasing order.

Example: Given n =7 ,the output is 4 because the possible combinations are (6,1)(5,2)(4,3)(4,2,1). Note that though (4,1,1,1) also add upto 7,however it has repeating numbers. Hence it is not a valid subset.

I have solved this problem using the backtracking method which has exponential complexity. However, I am trying to figure out how can we solve this problem using Dynamic Prog ? The nearest I could come up with is the series : for n= 0,1,2,3,4,5,6,7,8,9,10 for which the output is 0,0,0,1,1,2,3,4,5,7,9 respectively. However, I am not able to come up with a general formula that I can use to calculate f(n) . While going through the internet I came across this kind of integer sequences in https://oeis.org/search?q=0%2C0%2C0%2C1%2C1%2C2%2C3%2C4%2C5%2C7%2C9%2C11%2C14&sort=&language=&go=Search . But I am unable to understand the formula they have provided here. Any help would be appreciated. Any other insights that improves exponential complexity is also appreciated.

user6732794 · Accepted Answer

What you call "unique subset generation" is better known as integer partitions with distinct parts. Mathworld has an entry on the Q function, which counts the number of partitions with distinct parts.

However, your function does not count trivial partitions (i.e. n -> (n)), so what you're looking for is actually Q(n) - 1, which generates the sequence that you linked in your question--the number of partitions into at least 2 distinct parts. This answer to a similar question on Mathematics contains an efficient algorithm in Java for computing the sequence up to n = 200, which easily can be adapted for larger values.

Here's a combinatorial explanation of the referenced algorithm:

Let's start with a table of all the subsets of {1, 2, 3}, grouped by their sums:

 index (sum)    partitions        count
     0           ()                 1
     1           (1)                1 
     2           (2)                1
     3           (1+2), (3)         2
     4           (1+3)              1
     5           (2+3)              1
     6           (1+2+3)            1

Suppose we want to construct a new table of all of subsets of {1, 2, 3, 4}. Notice that every subset of {1, 2, 3} is also a subset of {1, 2, 3, 4}, so each subset above will appear in our new table. In fact, we can divide our new table into two equally sized categories: subsets which do not include 4, and those that do. What we can do is start from the table above, copy it over, and then "extend" it with 4. Here's the table for {1, 2, 3, 4}:

 index (sum)   partitions        count
     0          ()                 1
     1          (1)                1 
     2          (2)                1
     3          (1+2), (3)         2
     4          (1+3), [4]         2
     5          (2+3), [1+4]       2
     6          (1+2+3),[2+4]      2
     7          [1+2+4],[3+4]      2
     8          [1+3+4]            1
     9          [2+3+4]            1
    10          [1+2+3+4]          1

All the subsets that include 4 are surrounded by square brackets, and they are formed by adding 4 to exactly one of the old subsets which are surrounded by parentheses. We can repeat this process and build the table for {1,2,..,5}, {1,2,..,6}, etc.

But we don't actually needed to store the actual subsets/partitions, we just need the counts for each index/sum. For example, if with have the table for {1, 2, 3} with only counts, we can build the count-table for {1, 2, 3, 4} by taking each (index, count) pair from the old table, and adding count to the the current count for index + 4. The idea is that, say, if there are two subsets of {1, 2, 3} that sum to 3, then adding 4 to each of those two subsets will make two new subsets for 7.

With this in mind, here's a Python implementation of this process:

def c(n):
    counts = [1]

    for k in range(1, n + 1):
        new_counts = counts[:] + [0]*k

        for index, count in enumerate(counts):
            new_counts[index + k] += count

        counts = new_counts

    return counts

We start with the table for {}, which has just one subset--the empty set-- which sums to zero. Then we build the table for {1}, then for {1, 2}, ..., all the way up to {1,2,..,n}. Once we're done, we'll have counted every partition for n, since no partition of n can include integers larger than n.

Now, we can make 2 major optimizations to the code:

Limit the table to only include entries up to n, since that's all we're interested in. If index + k exceeds n, then we just ignore it. We can even preallocate space for the final table up front, rather than growing bigger tables on each iteration.
Instead of building a new table from scratch on each iteration, if we carefully iterate over the old table backwards, we can actually update it in-place without messing up any of the new values.

With these optimizations, you effectively have the same algorithm from the Mathematics answer that was referenced earlier, which was motivated by generating functions. It runs in O(n^2) time and takes only O(n) space. Here's what it looks like in Python:

def c(n):
    counts = [1] + [0]*n

    for k in range(1, n + 1):
        for i in reversed(range(k, n + 1)):
            counts[i] += counts[i - k]

    return counts

def Q(n):
    "-> the number of subsets of {1,..,n} that sum to n."
    return c(n)[n]

def f(n):
    "-> the number of subsets of {1,..,n} of size >= 2 that sum to n."
    return Q(n) - 1

Dynamic Programming Implementation of Unique subset generation

Answers (1)

Related Questions