Reputation: 11558
How can I optimize the next()
and hasNext()
methods in the following generator which produces combinations of a bounded multiset? (I posted this to C++ as well as Java because the code is C++-compatible and has no Java-specific elements that do not convert directly to C++.
The specific areas of the algorithm which are problematic are the entire hasNext()
method which may be unnecessarily complicated, and the line:
if( current[xSlot] > 0 ) aiItemsUsed[current[xSlot]]--;
which has an if-statement I think could be removed somehow. I had an earlier version of the algorithm which had some of the backtracking before the return statement and consequently had a much simpler hasNext()
test, but I could not get that version to work.
The background of this algorithm is that it is very difficult to find. For example, in Knuth 7.2.1.3 he merely says that it can be done (and gives an exercise to prove that the algorithm is possible), but does not give an algorithm. Likewise, I have half a dozen advanced texts on combinatorics (including Papadimitriou and Kreher/Stimson) and none of them give an algorithm for generating combinations of a multiset. Kreher leaves it as an "exercise for the reader". Anyway, if you can improve the algorithm as above or provide a reference to a working implementation more efficient than mine I would appreciate it. Please give only iterative algorithms (no recursion, please).
/** The iterator returns a 1-based array of integers. When the last combination is reached hasNext() will be false.
* @param aiItems One-based array containing number of items available for each unique item type where aiItems[0] is the number of item types
* @param ctSlots The number of slots into which the items go
* @return The iterator which generates the 1-based array containing the combinations or null in the event of an error.
*/
public static java.util.Iterator<int[]> combination( final int[] aiItems, final int ctSlots ){ // multiset combination into a limited number of slots
CombinatoricIterator<int[]> iterator = new CombinatoricIterator<int[]>(){
int xSlot;
int xItemType;
int ctItemType;
int[] current = new int[ctSlots + 1];
int[] aiItemsUsed = new int[aiItems[0] + 1];
{ reset(); current[0] = ctSlots; ctItemType = aiItems[0]; }
public boolean hasNext(){
int xUseSlot = ctSlots;
int iCurrentType = ctItemType;
int ctItemsUsed = 0;
int ctTotalItemsUsed = 0;
while( true ){
int xUsedType = current[xUseSlot];
if( xUsedType != iCurrentType ) return true;
ctItemsUsed++;
ctTotalItemsUsed++;
if( ctTotalItemsUsed == ctSlots ) return false;
if( ctItemsUsed == aiItems[xUsedType] ){
iCurrentType--;
ctItemsUsed = 0;
}
xUseSlot--;
}
}
public int[] next(){
while( true ){
while( xItemType == ctItemType ){
xSlot--;
xItemType = current[xSlot];
}
xItemType++;
while( true ){
while( aiItemsUsed[xItemType] == aiItems[xItemType] && xItemType != current[xSlot] ){
while( xItemType == ctItemType ){
xSlot--;
xItemType = current[xSlot];
}
xItemType++;
}
if( current[xSlot] > 0 ) aiItemsUsed[current[xSlot]]--;
current[xSlot] = xItemType;
aiItemsUsed[xItemType]++;
if( xSlot == ctSlots ){
return current;
}
xSlot++;
}
}
}
public int[] get(){ return current; }
public void remove(){}
public void set( int[] current ){ this.current = current; }
public void setValues( int[] current ){
if( this.current == null || this.current.length != current.length ) this.current = new int[current.length];
System.arraycopy( current, 0, this.current, 0, current.length );
}
public void reset(){
xSlot = 1;
xItemType = 0;
Arrays.fill( current, 0 ); current[0] = ctSlots;
Arrays.fill( aiItemsUsed, 0 ); aiItemsUsed[0] = aiItems[0];
}
};
return iterator;
}
ADDITIONAL INFO
Some of the respondents so far seem to not understand the difference between a set and a bounded multiset. A bounded multiset has repeating elements. For example { a, a, b, b, b, c } is a bounded multiset, which would be encoded as { 3, 2, 3, 1 } in my algorithm. Note that the leading "3" is the number of item types (unique items) in the set. If you supply an algorithm, then the following test should produce the output as shown below.
private static void combination_multiset_test(){
int[] aiItems = { 4, 3, 2, 1, 1 };
int iSlots = 4;
java.util.Iterator<int[]> iterator = combination( aiItems, iSlots );
if( iterator == null ){
System.out.println( "null" );
System.exit( -1 );
}
int xCombination = 0;
while( iterator.hasNext() ){
xCombination++;
int[] combination = iterator.next();
if( combination == null ){
System.out.println( "improper termination, no result" );
System.exit( -1 );
}
System.out.println( xCombination + ": " + Arrays.toString( combination ) );
}
System.out.println( "complete" );
}
1: [4, 1, 1, 1, 2]
2: [4, 1, 1, 1, 3]
3: [4, 1, 1, 1, 4]
4: [4, 1, 1, 2, 2]
5: [4, 1, 1, 2, 3]
6: [4, 1, 1, 2, 4]
7: [4, 1, 1, 3, 4]
8: [4, 1, 2, 2, 3]
9: [4, 1, 2, 2, 4]
10: [4, 1, 2, 3, 4]
11: [4, 2, 2, 3, 4]
complete
Upvotes: 7
Views: 955
Reputation: 1463
EDIT: done adjusting answer according to the clarified question
Main idea: again, the resulting selection can be encoded similar to a custom numeral system. One could increment a counter and interpret that counter as a selection.
However, since there is an additional restriction of the size of selection == target
.
A naive way to implement the restriction is to just check the size of resulting selection and skip ones that does not satisfy the restriction. But that is slow.
So all I did was to do a little cleverer increment that jump to the selection with correct size directly.
Sorry the code is in Python. But I did it in a way comparable to Java iterator interface. The input & output format is:
haves[i] := multiplicity of the i-th item in the collection
target := output collection must have this size
The code:
class Perm(object):
def __init__(self,items,haves,target):
assert sum(haves) >= target
assert all(h > 0 for h in haves)
self.items = items
self.haves = haves
self.target = target
self.ans = None
self.stop = False
def __iter__(self):
return self
def reset(self):
self.ans = [0]*len(self.haves)
self.__fill(self.target)
self.stop = False
def __fill(self,n):
"""fill ans from LSB with n bits"""
if n <= 0: return
i = 0
while n > self.haves[i]:
assert self.ans[i] == 0
self.ans[i] = self.haves[i]
n -= self.haves[i]
i += 1
assert self.ans[i] == 0
self.ans[i] = n
def __inc(self):
"""increment from LSB, carry when 'target' or 'haves' constrain is broken"""
# in fact, the 'target' constrain is always broken on the left most non-zero entry
# find left most non-zero
i = 0
while self.ans[i] == 0:
i += 1
# set it to zero
l = self.ans[i]
self.ans[i] = 0
# do increment answer, and carry
while True:
# increment to the next entry, if possible
i += 1
if i >= len(self.ans):
self.stop = True
raise StopIteration
#
if self.ans[i] == self.haves[i]:
l += self.ans[i]
self.ans[i] = 0
else:
l -= 1
self.ans[i] += 1
break
return l
def next(self):
if self.stop:
raise StopIteration
elif self.ans is None:
self.reset()
else:
l = self.__inc()
self.__fill(l)
return self.ans
Note that the items
argument isn't really used.
The assert
in the __init__
is to make clear of my assumption about the input.
The assert
in the __fill
is to just show a convenient property of self.ans
in the context that __fill
is called.
Here is a nice skeleton for testing the code:
test_cases = [([3,2,1], 3),
([3,2,1], 5),
([3,2,1], 6),
([4,3,2,1,1], 4),
([1,3,1,2,4], 4),
]
P = Perm(None,*test_cases[-1])
for p in P:
print p
#raw_input()
Example result from the input ([1,3,1,2,4], 4)
:
[1, 3, 0, 0, 0]
[1, 2, 1, 0, 0]
[0, 3, 1, 0, 0]
[1, 2, 0, 1, 0]
[0, 3, 0, 1, 0]
[1, 1, 1, 1, 0]
[0, 2, 1, 1, 0]
[1, 1, 0, 2, 0]
[0, 2, 0, 2, 0]
[1, 0, 1, 2, 0]
[0, 1, 1, 2, 0]
[1, 2, 0, 0, 1]
[0, 3, 0, 0, 1]
[1, 1, 1, 0, 1]
[0, 2, 1, 0, 1]
[1, 1, 0, 1, 1]
[0, 2, 0, 1, 1]
[1, 0, 1, 1, 1]
[0, 1, 1, 1, 1]
[1, 0, 0, 2, 1]
[0, 1, 0, 2, 1]
[0, 0, 1, 2, 1]
[1, 1, 0, 0, 2]
[0, 2, 0, 0, 2]
[1, 0, 1, 0, 2]
[0, 1, 1, 0, 2]
[1, 0, 0, 1, 2]
[0, 1, 0, 1, 2]
[0, 0, 1, 1, 2]
[0, 0, 0, 2, 2]
[1, 0, 0, 0, 3]
[0, 1, 0, 0, 3]
[0, 0, 1, 0, 3]
[0, 0, 0, 1, 3]
[0, 0, 0, 0, 4]
Performance Each next()
call takes O(h)
where h
is the number of types of items (size of haves
list).
Upvotes: 1
Reputation: 275740
I'd write a simple helper class that does increment
, highbit
and for_each_bit
.
I'd first wrap an unsigned int
, and limit it to 32 bits, and maybe extend it by std::bitset
or a std::vector<uint32_t>
if I was feeling ambitious -- but by having these 3 methods to start, I can test it and get it working.
increment
is easy, esp on a naked 32 bit int.
highbit
returns the bit position of the highest set bit.
for_each_bit
has this signature in C++:
template<typename Lambda>
void for_each_bit( my_bignum const& num, Lambda&& func )
and it calls func
with the index of each set bit in num
.
That should take at most a few minutes to write up.
Throw away hasNext
, follow the iterator concept -- you have a begin
subset and an end
subset, and the end
is not valid to extract the value of. Dereferencing these iterators produces the subset in question (or produces a factory for said subset).
end
is now easy to work out -- if highbit
is >= the number of elements in your set, you are past the end of the permutations.
begin
is either zero, or 1, depending on if you want the empty subset to be included.
next
just increments your bignum
.
Producing the subset simply involves calling for_each_bit
, and putting that item from your set into the subset.
Next, improve increment
to allow random access, and you can then implement iterating over the subsets in parallel!
This solves the set problem. To solve the mutltiset problem, first solve the derived set problem (where you pretend there is only 0 or 1 of each element), and iterate over that. Then, on each iteration of the derived set, build up a std::vector
of the max count of each element.
Then do something like this:
#include <utility>
#include <cstddef>
#include <vector>
using std::size_t;
namespace details {
template<typename Lambda>
void for_each_multiset_combo_worker( std::vector<size_t> const& counts, Lambda&& lambda, std::vector<size_t>& indexes, std::vector<size_t>& current )
{
if (depth >= counts.size()) {
lambda( current );
return;
}
for (size_t i = 0; i <= counts[depth]; ++i) {
// Assert: current.size() == depth
current.push_back(i);
// Assert: current.back() == i
// Assert: current.size() == dpeth+1
for_each_multiset_combo_worker( counts, lambda, depth+1, current );
// Assert: current.back() == i
// Assert: current.size() == dpeth+1
current.pop_back();
// Assert: current.size() == depth
}
}
}
template<typename Lambda>
void for_each_multiset_combo( std::vector<size_t> const& counts, Lambda&& lambda )
{
std::vector<size_t> current;
current.reserve( counts.size() );
details::for_each_multiset_combo_worker( counts, std::forward<Lambda>(lambda), 0, current );
}
#include <iostream>
int main() {
std::vector<size_t> multiset = {3, 2, 1, 1};
size_t counter = 0;
for_each_multiset_combo( multiset, [&]( std::vector<size_t> const& counts ){
std::cout << counter << ": [";
for(auto it = counts.begin(); it != counts.end(); ++it) {
if (it != counts.begin()) {
std::cout << ", ";
}
std::cout << *it;
}
std::cout << "]\n";
++counter;
});
}
Live example: http://ideone.com/8GN1xx
In this live example, I skipped the optimization of doing the set iteration first, and instead directly iterated over the multiset.
(Limitations: no more than max size_t
elements of each type, and no more than max capacity of std::vector
different types of elements).
I have no need for the leading "number of distinct elements in the multiset", so I didn't use it.
And here is the iterative version of the above recursive algorithm, using the usual "turn the implicit recursion stack into an explicit iteration stack" technique:
#include <utility>
#include <cstddef>
#include <vector>
using std::size_t;
template<typename Lambda>
void for_each_multiset_combo( std::vector<size_t> const& counts, Lambda&& lambda )
{
// below code is easier if I assume counts is non-empty:
if (counts.empty())
{
lambda(counts);
return;
}
// preallocate a buffer big enough to hold the output counts:
std::vector<size_t> indexes;
indexes.reserve( counts.size() );
while(true) {
// append 0s on the end of indexes if we have room:
while (indexes.size() < counts.size()) {
indexes.push_back(0);
}
// at this point, we have a unique element. Pass it to the passed in lambda:
lambda( indexes );
// The advancement logic. Advance the highest index. If that overflows, pop it and
// advance the next highest index:
indexes.back()++;
while (indexes.back() > counts[indexes.size()-1]) {
indexes.pop_back();
// we are done if we have managed to advance every index, and there are none left to advance:
if (indexes.empty())
return; // finished
indexes.back()++;
}
}
}
#include <iostream>
int main() {
std::vector<size_t> multiset = {3, 2, 1, 1};
size_t counter = 0;
for_each_multiset_combo( multiset, [&]( std::vector<size_t> const& counts ){
std::cout << counter << ": [";
for(auto it = counts.begin(); it != counts.end(); ++it) {
if (it != counts.begin()) {
std::cout << ", ";
}
std::cout << *it;
}
std::cout << "]\n";
++counter;
});
}
Upvotes: 1
Reputation: 18148
This paper provides an efficient iterative algorithm for generating multiset permutations on page 8
This paper provides another iterative algorithm, also on page 8
Upvotes: 0