Reputation: 612

Generate combinations ordered by an attribute

I'm looking for a way to generate combinations of objects ordered by a single attribute. I don't think lexicographical order is what I'm looking for... I'll try to give an example. Let's say I have a list of objects A,B,C,D with the attribute values I want to order by being 3,3,2,1. This gives A3, B3, C2, D1 objects. Now I want to generate combinations of 2 objects, but they need to be ordered in a descending way:

A3 B3
A3 C2
B3 C2
A3 D1
B3 D1
C2 D1

Generating all combinations and sorting them is not acceptable because the real world scenario involves large sets and millions of combinations. (set of 40, order of 8), and I need only combinations above the certain threshold.

Actually I need count of combinations above a threshold grouped by a sum of a given attribute, but I think it is far more difficult to do - so I'd settle for developing all combinations above a threshold and counting them. If that's possible at all.

EDIT - My original question wasn't very precise... I don't actually need these combinations ordered, just thought it would help to isolate combinations above a threshold. To be more precise, in the above example, giving a threshold of 5, I'm looking for an information that the given set produces 1 combination with a sum of 6 ( A3 B3 ) and 2 with a sum of 5 ( A3 C2, B3 C2). I don't actually need the combinations themselves.

I was looking into subset-sum problem, but if I understood correctly given dynamic solution it will only give you information is there a given sum or no, not count of the sums.

Thanks

Upvotes: 4

Answers (6)

Bob Bryan

Reputation: 3837

I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:

Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.

To read about this class and download the code, see Tablizing The Binomial Coeffieicent.

Upvotes: 1

Zach Scrivena

Reputation: 29539

Here's a recursive approach to count the number of these subsets: We define a function count(minIndex,numElements,minSum) that returns the number of subsets of size numElements whose sum is at least minSum, containing elements with indices minIndex or greater.

As in the problem statement, we sort our elements in descending order, e.g. [3,3,2,1], and call the first index zero, and the total number of elements N. We assume all elements are nonnegative. To find all 2-subsets whose sum is at least 5, we call count(0,2,5).

Sample Code (Java):

int count(int minIndex, int numElements, int minSum)
{
    int total = 0;

    if (numElements == 1)
    {
        // just count number of elements >= minSum
        for (int i = minIndex; i <= N-1; i++)
            if (a[i] >= minSum) total++; else break;
    }
    else
    {
        if (minSum <= 0)
        {
            // any subset will do (n-choose-k of them)
            if (numElements <= (N-minIndex))
                total = nchoosek(N-minIndex, numElements);
        }
        else
        {
            // add element a[i] to the set, and then consider the count
            // for all elements to its right
            for (int i = minIndex; i <= (N-numElements); i++)
                total += count(i+1, numElements-1, minSum-a[i]);
        }
    }

    return total;
}

Btw, I've run the above with an array of 40 elements, and size-8 subsets and consistently got back results in less than a second.

Upvotes: 0

joel.neely

Reputation: 30943

Actually, I think you do want lexicographic order, but descending rather than ascending. In addition:

It's not clear to me from your description that A, B, ... D play any role in your answer (except possibly as the container for the values).
I think your question example is simply "For each integer at least 5, up to the maximum possible total of two values, how many distinct pairs from the set {3, 3, 2, 1} have sums of that integer?"
The interesting part is the early bailout, once no possible solution can be reached (remaining achievable sums are too small).

~~I'll post sample code later.~~

Here's the sample code I promised, with a few remarks following:

public class Combos {

    /* permanent state for instance */
    private int values[];
    private int length;

    /* transient state during single "count" computation */
    private int n;
    private int limit;
    private Tally<Integer> tally;
    private int best[][];  // used for early-bail-out

    private void initializeForCount(int n, int limit) {
        this.n = n;
        this.limit = limit;
        best = new int[n+1][length+1];
        for (int i = 1; i <= n; ++i) {
            for (int j = 0; j <= length - i; ++j) {
                best[i][j] = values[j] + best[i-1][j+1];
            }
        }
    }

    private void countAt(int left, int start, int sum) {
        if (left == 0) {
            tally.inc(sum);
        } else {
            for (
                int i = start;
                i <= length - left
                && limit <= sum + best[left][i];  // bail-out-check
                ++i
            ) {
                countAt(left - 1, i + 1, sum + values[i]);
            }
        }
    }

    public Tally<Integer> count(int n, int limit) {
        tally = new Tally<Integer>();
        if (n <= length) {
            initializeForCount(n, limit);
            countAt(n, 0, 0);
        }
        return tally;
    }

    public Combos(int[] values) {
        this.values = values;
        this.length = values.length;
    }

}

Preface remarks:

This uses a little helper class called Tally, that just isolates the tabulation (including initialization for never-before-seen keys). I'll put it at the end.

To keep this concise, I've taken some shortcuts that aren't good practice for "real" code:

This doesn't check for a null value array, etc.
I assume that the value array is already sorted into descending order, required for the early-bail-out technique. (Good production code would include the sorting.)
I put transient data into instance variables instead of passing them as arguments among the private methods that support count. That makes this class non-thread-safe.

Explanation:

An instance of Combos is created with the (descending ordered) array of integers to combine. The value array is set up once per instance, but multiple calls to count can be made with varying population sizes and limits.

The count method triggers a (mostly) standard recursive traversal of unique combinations of n integers from values. The limit argument gives the lower bound on sums of interest.

The countAt method examines combinations of integers from values. The left argument is how many integers remain to make up n integers in a sum, start is the position in values from which to search, and sum is the partial sum.

The early-bail-out mechanism is based on computing best, a two-dimensional array that specifies the "best" sum reachable from a given state. The value in best[n][p] is the largest sum of n values beginning in position p of the original values.

The recursion of countAt bottoms out when the correct population has been accumulated; this adds the current sum (of n values) to the tally. If countAt has not bottomed out, it sweeps the values from the start-ing position to increase the current partial sum, as long as:

enough positions remain in values to achieve the specified population, and
the best (largest) subtotal remaining is big enough to make the limit.

A sample run with your question's data:

    int[] values = {3, 3, 2, 1};
    Combos mine = new Combos(values);
    Tally<Integer> tally = mine.count(2, 5);
    for (int i = 5; i < 9; ++i) {
        int n = tally.get(i);
        if (0 < n) {
            System.out.println("found " + tally.get(i) + " sums of " + i);
        }
    }

produces the results you specified:

found 2 sums of 5
found 1 sums of 6

Here's the Tally code:

public static class Tally<T> {
    private Map<T,Integer> tally = new HashMap<T,Integer>();
    public Tally() {/* nothing */}
    public void inc(T key) {
        Integer value = tally.get(key);
        if (value == null) {
            value = Integer.valueOf(0);
        }
        tally.put(key, (value + 1));
    }
    public int get(T key) {
        Integer result = tally.get(key);
        return result == null ? 0 : result;
    }
    public Collection<T> keys() {
        return tally.keySet();
    }
}

Upvotes: 2

Jeffrey Cameron

Reputation: 10285

If you're using C# there is a fairly good generics library here. Note though that the generation of some permutations is not in lexicographic order

Upvotes: 0

Niyaz

Reputation: 54793

I am extremely sorry (after all those clarifications in the comments) to say that I could not find an efficient solution to this problem. I tried for the past hour with no results.

The reason (I think) is that this problem is very similar to problems like the traveling salesman problem. Until unless you try all the combinations, there is no way to know which attributes will add upto the threshold.

There seems to be no clever trick that can solve this class of problems.

Still there are many optimizations that you can do to the actual code.

Try sorting the data according to the attributes. You may be able to avoid processing some values from the list when you find that a higher value cannot satisfy the threshold (so all lower values can be eliminated).

Upvotes: 0

simon

Reputation: 5915

Check out this question in stackoverflow: Algorithm to return all combinations

I also just used a the java code below to generate all permutations, but it could easily be used to generate unique combination's given an index.

public static <E> E[] permutation(E[] s, int num) {//s is the input elements array and num is the number which represents the permutation

    int factorial = 1;

    for(int i = 2; i < s.length; i++)
        factorial *= i;//calculates the factorial of (s.length - 1)

    if (num/s.length >= factorial)// Optional. if the number is not in the range of [0, s.length! - 1] 
        return null;

    for(int i = 0; i < s.length - 1; i++){//go over the array

        int tempi = (num / factorial) % (s.length - i);//calculates the next cell from the cells left (the cells in the range [i, s.length - 1])
        E temp = s[i + tempi];//Temporarily saves the value of the cell needed to add to the permutation this time 

        for(int j = i + tempi; j > i; j--)//shift all elements to "cover" the "missing" cell
            s[j] = s[j-1];

        s[i] = temp;//put the chosen cell in the correct spot

        factorial /= (s.length - (i + 1));//updates the factorial

    }

    return s;
}

Upvotes: 0

Generate combinations ordered by an attribute

Answers (6)

Preface remarks:

Explanation:

Related Questions