Reputation: 11558

How can I improve my algorithm for generating combinations of a multiset?

How can I optimize the next() and hasNext() methods in the following generator which produces combinations of a bounded multiset? (I posted this to C++ as well as Java because the code is C++-compatible and has no Java-specific elements that do not convert directly to C++.

The specific areas of the algorithm which are problematic are the entire hasNext() method which may be unnecessarily complicated, and the line:

if( current[xSlot] > 0 ) aiItemsUsed[current[xSlot]]--;

which has an if-statement I think could be removed somehow. I had an earlier version of the algorithm which had some of the backtracking before the return statement and consequently had a much simpler hasNext() test, but I could not get that version to work.

The background of this algorithm is that it is very difficult to find. For example, in Knuth 7.2.1.3 he merely says that it can be done (and gives an exercise to prove that the algorithm is possible), but does not give an algorithm. Likewise, I have half a dozen advanced texts on combinatorics (including Papadimitriou and Kreher/Stimson) and none of them give an algorithm for generating combinations of a multiset. Kreher leaves it as an "exercise for the reader". Anyway, if you can improve the algorithm as above or provide a reference to a working implementation more efficient than mine I would appreciate it. Please give only iterative algorithms (no recursion, please).

/** The iterator returns a 1-based array of integers. When the last combination is reached hasNext() will be false.
  * @param aiItems One-based array containing number of items available for each unique item type where aiItems[0] is the number of item types
  * @param ctSlots  The number of slots into which the items go
  * @return The iterator which generates the 1-based array containing the combinations or null in the event of an error.
  */
public static java.util.Iterator<int[]> combination( final int[] aiItems, final int ctSlots ){ // multiset combination into a limited number of slots
    CombinatoricIterator<int[]> iterator = new CombinatoricIterator<int[]>(){
        int xSlot;
        int xItemType;
        int ctItemType;
        int[] current = new int[ctSlots + 1];
        int[] aiItemsUsed = new int[aiItems[0] + 1];
        { reset(); current[0] = ctSlots; ctItemType = aiItems[0]; }
        public boolean hasNext(){
            int xUseSlot = ctSlots;
            int iCurrentType = ctItemType;
            int ctItemsUsed = 0;
            int ctTotalItemsUsed = 0;
            while( true ){
                int xUsedType = current[xUseSlot];
                if( xUsedType != iCurrentType ) return true;
                ctItemsUsed++;
                ctTotalItemsUsed++;
                if( ctTotalItemsUsed == ctSlots ) return false;
                if( ctItemsUsed == aiItems[xUsedType] ){
                    iCurrentType--;
                    ctItemsUsed = 0;
                }
                xUseSlot--;
            }
        }
        public int[] next(){
            while( true ){
                while( xItemType == ctItemType ){
                    xSlot--;
                    xItemType = current[xSlot];
                }
                xItemType++;
                while( true ){
                    while( aiItemsUsed[xItemType] == aiItems[xItemType] && xItemType != current[xSlot] ){
                        while( xItemType == ctItemType ){
                            xSlot--;
                            xItemType = current[xSlot];
                        }
                        xItemType++;
                    }
                    if( current[xSlot] > 0 ) aiItemsUsed[current[xSlot]]--;
                    current[xSlot] = xItemType;
                    aiItemsUsed[xItemType]++;
                    if( xSlot == ctSlots ){
                        return current;
                    }
                    xSlot++;
                }
            }

        }
        public int[] get(){ return current; }
        public void remove(){}
        public void set( int[] current ){ this.current = current; }
        public void setValues( int[] current ){
            if( this.current == null || this.current.length != current.length ) this.current = new int[current.length];
            System.arraycopy( current, 0, this.current, 0, current.length );
        }
        public void reset(){
            xSlot = 1;
            xItemType = 0;
            Arrays.fill( current, 0 ); current[0] = ctSlots;
            Arrays.fill( aiItemsUsed, 0 ); aiItemsUsed[0] = aiItems[0];
        }
    };
    return iterator;
}

ADDITIONAL INFO

Some of the respondents so far seem to not understand the difference between a set and a bounded multiset. A bounded multiset has repeating elements. For example { a, a, b, b, b, c } is a bounded multiset, which would be encoded as { 3, 2, 3, 1 } in my algorithm. Note that the leading "3" is the number of item types (unique items) in the set. If you supply an algorithm, then the following test should produce the output as shown below.

    private static void combination_multiset_test(){
        int[] aiItems = { 4, 3, 2, 1, 1 };
        int iSlots = 4;
        java.util.Iterator<int[]> iterator = combination( aiItems, iSlots );
        if( iterator == null ){
            System.out.println( "null" );
            System.exit( -1 );
        }
        int xCombination = 0;
        while( iterator.hasNext() ){
            xCombination++;
            int[] combination = iterator.next();
            if( combination == null ){
                System.out.println( "improper termination, no result" );
                System.exit( -1 );
            }
            System.out.println( xCombination + ": " + Arrays.toString( combination ) );
        }
        System.out.println( "complete" );
    }


1: [4, 1, 1, 1, 2]
2: [4, 1, 1, 1, 3]
3: [4, 1, 1, 1, 4]
4: [4, 1, 1, 2, 2]
5: [4, 1, 1, 2, 3]
6: [4, 1, 1, 2, 4]
7: [4, 1, 1, 3, 4]
8: [4, 1, 2, 2, 3]
9: [4, 1, 2, 2, 4]
10: [4, 1, 2, 3, 4]
11: [4, 2, 2, 3, 4]
complete

Upvotes: 7

Answers (3)

Apiwat Chantawibul

Reputation: 1463

EDIT: done adjusting answer according to the clarified question

Main idea: again, the resulting selection can be encoded similar to a custom numeral system. One could increment a counter and interpret that counter as a selection.

However, since there is an additional restriction of the size of selection == target. A naive way to implement the restriction is to just check the size of resulting selection and skip ones that does not satisfy the restriction. But that is slow.

So all I did was to do a little cleverer increment that jump to the selection with correct size directly.

Sorry the code is in Python. But I did it in a way comparable to Java iterator interface. The input & output format is:

haves[i] := multiplicity of the i-th item in the collection
target := output collection must have this size

The code:

class Perm(object):
    def __init__(self,items,haves,target):
        assert sum(haves) >= target
        assert all(h > 0 for h in haves)
        self.items = items
        self.haves = haves
        self.target = target
        self.ans = None
        self.stop = False
    def __iter__(self):
        return self
    def reset(self):
        self.ans = [0]*len(self.haves)
        self.__fill(self.target)
        self.stop = False
    def __fill(self,n):
        """fill ans from LSB with n bits"""
        if n <= 0: return
        i = 0
        while n > self.haves[i]:
            assert self.ans[i] == 0
            self.ans[i] = self.haves[i]
            n -= self.haves[i]
            i += 1
        assert self.ans[i] == 0
        self.ans[i] = n
    def __inc(self):
        """increment from LSB, carry when 'target' or 'haves' constrain is broken"""
        # in fact, the 'target' constrain is always broken on the left most non-zero entry
        # find left most non-zero
        i = 0
        while self.ans[i] == 0:
            i += 1
        # set it to zero
        l = self.ans[i]
        self.ans[i] = 0
        # do increment answer, and carry
        while True:
            # increment to the next entry, if possible
            i += 1
            if i >= len(self.ans):
                self.stop = True
                raise StopIteration
            #
            if self.ans[i] == self.haves[i]:
                l += self.ans[i]
                self.ans[i] = 0
            else:
                l -= 1
                self.ans[i] += 1
                break
        return l
    def next(self):
        if self.stop:
            raise StopIteration
        elif self.ans is None:
            self.reset()
        else:
            l = self.__inc()
            self.__fill(l)
        return self.ans

Note that the items argument isn't really used.

The assert in the __init__ is to make clear of my assumption about the input.

The assert in the __fill is to just show a convenient property of self.ans in the context that __fill is called.

Here is a nice skeleton for testing the code:

test_cases = [([3,2,1], 3),
              ([3,2,1], 5),
              ([3,2,1], 6),
              ([4,3,2,1,1], 4),
              ([1,3,1,2,4], 4),
             ]

P = Perm(None,*test_cases[-1])
for p in P:
    print p
    #raw_input()

Example result from the input ([1,3,1,2,4], 4):

[1, 3, 0, 0, 0]
[1, 2, 1, 0, 0]
[0, 3, 1, 0, 0]
[1, 2, 0, 1, 0]
[0, 3, 0, 1, 0]
[1, 1, 1, 1, 0]
[0, 2, 1, 1, 0]
[1, 1, 0, 2, 0]
[0, 2, 0, 2, 0]
[1, 0, 1, 2, 0]
[0, 1, 1, 2, 0]
[1, 2, 0, 0, 1]
[0, 3, 0, 0, 1]
[1, 1, 1, 0, 1]
[0, 2, 1, 0, 1]
[1, 1, 0, 1, 1]
[0, 2, 0, 1, 1]
[1, 0, 1, 1, 1]
[0, 1, 1, 1, 1]
[1, 0, 0, 2, 1]
[0, 1, 0, 2, 1]
[0, 0, 1, 2, 1]
[1, 1, 0, 0, 2]
[0, 2, 0, 0, 2]
[1, 0, 1, 0, 2]
[0, 1, 1, 0, 2]
[1, 0, 0, 1, 2]
[0, 1, 0, 1, 2]
[0, 0, 1, 1, 2]
[0, 0, 0, 2, 2]
[1, 0, 0, 0, 3]
[0, 1, 0, 0, 3]
[0, 0, 1, 0, 3]
[0, 0, 0, 1, 3]
[0, 0, 0, 0, 4]

Performance Each next() call takes O(h) where h is the number of types of items (size of haves list).

Upvotes: 1

Yakk - Adam Nevraumont

Reputation: 275740

I'd write a simple helper class that does increment, highbit and for_each_bit.

I'd first wrap an unsigned int, and limit it to 32 bits, and maybe extend it by std::bitset or a std::vector<uint32_t> if I was feeling ambitious -- but by having these 3 methods to start, I can test it and get it working.

increment is easy, esp on a naked 32 bit int.

highbit returns the bit position of the highest set bit.

for_each_bit has this signature in C++:

template<typename Lambda>
void for_each_bit( my_bignum const& num, Lambda&& func )

and it calls func with the index of each set bit in num.

That should take at most a few minutes to write up.

Throw away hasNext, follow the iterator concept -- you have a begin subset and an end subset, and the end is not valid to extract the value of. Dereferencing these iterators produces the subset in question (or produces a factory for said subset).

end is now easy to work out -- if highbit is >= the number of elements in your set, you are past the end of the permutations.

begin is either zero, or 1, depending on if you want the empty subset to be included.

next just increments your bignum.

Producing the subset simply involves calling for_each_bit, and putting that item from your set into the subset.

Next, improve increment to allow random access, and you can then implement iterating over the subsets in parallel!

This solves the set problem. To solve the mutltiset problem, first solve the derived set problem (where you pretend there is only 0 or 1 of each element), and iterate over that. Then, on each iteration of the derived set, build up a std::vector of the max count of each element.

Then do something like this:

#include <utility>
#include <cstddef>
#include <vector>

using std::size_t;

namespace details {
template<typename Lambda>
  void for_each_multiset_combo_worker( std::vector<size_t> const& counts, Lambda&& lambda, std::vector<size_t>& indexes, std::vector<size_t>& current )
  {
    if (depth >= counts.size()) {
      lambda( current );
      return;
    }
    for (size_t i = 0; i <= counts[depth]; ++i) {
      // Assert: current.size() == depth
      current.push_back(i);
      // Assert: current.back() == i
      // Assert: current.size() == dpeth+1
      for_each_multiset_combo_worker( counts, lambda, depth+1, current );
      // Assert: current.back() == i
      // Assert: current.size() == dpeth+1
      current.pop_back();
      // Assert: current.size() == depth
    }
  }
}
template<typename Lambda>
void for_each_multiset_combo( std::vector<size_t> const& counts, Lambda&& lambda )
{
  std::vector<size_t> current;
  current.reserve( counts.size() );
  details::for_each_multiset_combo_worker( counts, std::forward<Lambda>(lambda), 0, current );
}
#include <iostream>

int main() {
  std::vector<size_t> multiset = {3, 2, 1, 1};
  size_t counter = 0;
  for_each_multiset_combo( multiset, [&]( std::vector<size_t> const& counts ){
    std::cout << counter << ": [";
    for(auto it = counts.begin(); it != counts.end(); ++it) {
      if (it != counts.begin()) {
        std::cout << ", ";
      }
      std::cout << *it;
    }
    std::cout << "]\n";
    ++counter;
  });
}

Live example: http://ideone.com/8GN1xx

In this live example, I skipped the optimization of doing the set iteration first, and instead directly iterated over the multiset.

(Limitations: no more than max size_t elements of each type, and no more than max capacity of std::vector different types of elements).

I have no need for the leading "number of distinct elements in the multiset", so I didn't use it.

And here is the iterative version of the above recursive algorithm, using the usual "turn the implicit recursion stack into an explicit iteration stack" technique:

#include <utility>
#include <cstddef>
#include <vector>

using std::size_t;

template<typename Lambda>
void for_each_multiset_combo( std::vector<size_t> const& counts, Lambda&& lambda )
{
  // below code is easier if I assume counts is non-empty:
  if (counts.empty())
  {
    lambda(counts);
    return;
  }
  // preallocate a buffer big enough to hold the output counts:
  std::vector<size_t> indexes;
  indexes.reserve( counts.size() );
  while(true) {
    // append 0s on the end of indexes if we have room:
    while (indexes.size() < counts.size()) {
      indexes.push_back(0);
    }
    // at this point, we have a unique element.  Pass it to the passed in lambda:
    lambda( indexes );
    // The advancement logic.  Advance the highest index.  If that overflows, pop it and
    // advance the next highest index:
    indexes.back()++;
    while (indexes.back() > counts[indexes.size()-1]) {
      indexes.pop_back();
      // we are done if we have managed to advance every index, and there are none left to advance:
      if (indexes.empty())
        return; // finished
      indexes.back()++;
    }
  }
}
#include <iostream>

int main() {
  std::vector<size_t> multiset = {3, 2, 1, 1};
  size_t counter = 0;
  for_each_multiset_combo( multiset, [&]( std::vector<size_t> const& counts ){
    std::cout << counter << ": [";
    for(auto it = counts.begin(); it != counts.end(); ++it) {
      if (it != counts.begin()) {
        std::cout << ", ";
      }
      std::cout << *it;
    }
    std::cout << "]\n";
    ++counter;
  });
}

http://ideone.com/x2Zp2f

Upvotes: 1

Zim-Zam O'Pootertoot

Reputation: 18148

This paper provides an efficient iterative algorithm for generating multiset permutations on page 8

This paper provides another iterative algorithm, also on page 8

Upvotes: 0

How can I improve my algorithm for generating combinations of a multiset?

Answers (3)

Related Questions