John Humphreys
John Humphreys

Reputation: 39254

Strategy to modify permutation algorithm to prevent duplicate printouts

I've been reviewing algorithms for practice, and I'm currently looking at a permutation algorithm that I quite like:

void permute(char* set, int begin, int end) {
    int range = end - begin;

    if (range == 1)
        cout << set << endl;
    else {
        for(int i = 0; i < range; ++i) {
            swap(&set[begin], &set[begin+i]);
            permute(set, begin+1, end);
            swap(&set[begin], &set[begin+i]);
        }
    }
}

I actually wanted to apply this to a situation where there will be many repeated characters though, so I need to be able to modify it to prevent the printing of duplicate permutations.

How would I go about detecting that I was generating a duplicate? I know I could store this in a hash or something similar, but that's not an optimal solution - I'd prefer one that didn't require extra storage. Can someone give me a suggestion?

PS: I don't want to use the STL permutation mechanisms, and I don't want a reference to another "unique permutation algorithm" somewhere. I'd like to understand the mechanism used to prevent duplication so I can build it into this in learn, if possible.

Upvotes: 6

Views: 4434

Answers (6)

Simply insert each element to a set. It automatically removes duplicates. Declare set s as global variable.

set <string>s;
void permute(string a, int l, int r) {
    int i;
    if (l == r)
        s.insert(a);
    else
    {
        for (i = l; i <= r; i++)
        {
            swap((a[l]), (a[i]));
            permute(a, l+1, r);
            swap((a[l]), (a[i])); //backtrack
        }
    }
}

Finally print using the function

void printr()
{
    set <string> ::iterator itr;
    for (itr = s.begin(); itr != s.end(); ++itr)
    {
        cout << '\t' << *itr;
    }
    cout << '\t' << *itr;
}

Upvotes: 1

Moto Lee
Moto Lee

Reputation: 1

The key is not to swap the same character twice. So, you could use an unordered_set to memorize which characters have been swapped.

void permute(string& input, int begin, vector<string>& output) {
    if (begin == input.size()){
        output.push_back(input);
    }
    else {    
        unordered_set<char> swapped;
        for(int i = begin; i < input.size(); i++) {
            // Do not swap a character that has been swapped
            if(swapped.find(input[i]) == swapped.end()){
                swapped.insert(input[i]);
                swap(input[begin], input[i]);
                permute(input, begin+1, output);
                swap(input[begin], input[i]);
            }
        }
    }
}

You can go through your original code by hand, and you will find those cases where duplicates occur are "swapping with the character which has been swapped."

Ex: input = "BAA"

index = 0, i = 0, input = "BAA"

----> index = 1, i = 1, input = "BAA"

----> index = 1, i = 2, input = "BAA" (duplicate)

index = 0, i = 1, input = "ABA"

----> index = 1, i = 1, input = "ABA"

----> index = 1, i = 2, input = "AAB"

index = 0, i = 2, input = "AAB"

----> index = 1, i = 1, input = "AAB" (duplicate)

----> index = 1, i = 2, input = "ABA" (duplicate)

Upvotes: 0

Rusty Rob
Rusty Rob

Reputation: 17173

A simple solution is to change the duplicate characters randomly to characters that aren't already present. Then after permutation, change the characters back. Only accept a permutation if its characters are in order.

e.g. if you have "a,b,b"

you would have had the following:

a b b
a b b
b a b
b a b
b b a
b b a

But, if we start with a,b,b and note the duplicate b's, then we can change the second b to a c

now we have a b c

a b c - accept because b is before c. change c back to b to get a b b
a c b - reject because c is before b
b a c - accept as b a b
b c a - accept as b b a
c b a - reject as c comes before b.
c a b - reject as c comes before b.

Upvotes: 2

Reinstate Monica
Reinstate Monica

Reputation: 4723

There is no general way to prevent arbitrary functions from generating duplicates. You can always filter out the duplicates, of course, but you don't want that, and for very good reasons. So you need a special way to generate only non-duplicates.

One way would be to generate the permutations in increasing lexicographical order. Then you can just compare if a "new" permatutation is the same as the last one, and then skip it. It gets even better: the algorithm for generating permutations in increasing lexicographical order given at http://en.wikipedia.org/wiki/Permutations#Generation_in_lexicographic_order doesn't even generate the duplicates at all!

However, that is not an answer to your question, as it is a different algorithm (although based on swapping, too).

So, let's look at your algorithm a little closer. One key observation is:

  • Once a character is swapped to position begin, it will stay there for all nested calls of permute.

We'll combine this with the following general observation about permutations:

  • If you permute a string s, but only at positions where there's the same character, s will remain the same. In fact, all duplicate permutations have a different order for the occurences of some character c, where c occurs at the same positions.

OK, so all we have to do is to make sure that the occurences of each character are always in the same order as in the beginning. Code follows, but... I don't really speak C++, so I'll use Python and hope to get away with claiming it's pseudo code.

We start by your original algorithm, rewritten in 'pseudo code':

def permute(s, begin, end):
    if end == begin + 1:
        print(s)
    else:
        for i in range(begin, end):
            s[begin], s[i] = s[i], s[begin]
            permute(s, begin + 1, end)
            s[begin], s[i] = s[i], s[begin]

and a helper function that makes calling it easier:

def permutations_w_duplicates(s):
    permute(list(s), 0, len(s)) # use a list, as in Python strings are not mutable

Now we extend the permute function with some bookkeeping about how many times a certain character has been swapped to the begin position (i.e. has been fixed), and we also remember the original order of the occurences of each character (char_number). Each character that we try to swap to the begin position then has to be the next higher in the original order, i.e. the number of fixes for a character defines which original occurence of this character may be fixed next - I call this next_fixable.

def permute2(s, next_fixable, char_number, begin, end):
    if end == begin + 1:
        print(s)
    else:
        for i in range(begin, end):
            if next_fixable[s[i]] == char_number[i]: 
                next_fixable[s[i]] += 1
                char_number[begin], char_number[i] = char_number[i], char_number[begin]

                s[begin], s[i] = s[i], s[begin]
                permute2(s, next_fixable, char_number, begin + 1, end)
                s[begin], s[i] = s[i], s[begin]

                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                next_fixable[s[i]] -= 1

Again, we use a helper function:

def permutations_wo_duplicates(s):
    alphabet = set(s)
    next_fixable = dict.fromkeys(alphabet, 0)
    count = dict.fromkeys(alphabet, 0)
    char_number = [0] * len(s)
    for i, c in enumerate(s):
        char_number[i] = count[c]
        count[c] += 1

    permute2(list(s), next_fixable, char_number, 0, len(s))

That's it!

Almost. You can stop here and rewrite in C++ if you like, but if you are interested in some test data, read on.

I used a slightly different code for testing, because I didn't want to print all permutations. In Python, you would replace the print with a yield, with turns the function into a generator function, the result of which can be iterated over with a for loop, and the permutations will be computed only when needed. This is the real code and test I used:

def permute2(s, next_fixable, char_number, begin, end):
    if end == begin + 1:
        yield "".join(s) # join the characters to form a string
    else:
        for i in range(begin, end):
            if next_fixable[s[i]] == char_number[i]:
                next_fixable[s[i]] += 1
                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                s[begin], s[i] = s[i], s[begin]
                for p in permute2(s, next_fixable, char_number, begin + 1, end):
                    yield p
                s[begin], s[i] = s[i], s[begin]
                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                next_fixable[s[i]] -= 1

def permutations_wo_duplicates(s):
    alphabet = set(s)
    next_fixable = dict.fromkeys(alphabet, 0)
    count = dict.fromkeys(alphabet, 0)
    char_number = [0] * len(s)
    for i, c in enumerate(s):
        char_number[i] = count[c]
        count[c] += 1

    for p in permute2(list(s), next_fixable, char_number, 0, len(s)):
        yield p


s = "FOOQUUXFOO"
A = list(permutations_w_duplicates(s))
print("%s has %s permutations (counting duplicates)" % (s, len(A)))
print("permutations of these that are unique: %s" % len(set(A)))
B = list(permutations_wo_duplicates(s))
print("%s has %s unique permutations (directly computed)" % (s, len(B)))

print("The first 10 permutations       :", A[:10])
print("The first 10 unique permutations:", B[:10])

And the result:

FOOQUUXFOO has 3628800 permutations (counting duplicates)
permutations of these that are unique: 37800
FOOQUUXFOO has 37800 unique permutations (directly computed)
The first 10 permutations       : ['FOOQUUXFOO', 'FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUXOOF', 'FOOQUUXOFO', 'FOOQUUFXOO', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX']
The first 10 unique permutations: ['FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX', 'FOOQUUOFXO', 'FOOQUUOFOX', 'FOOQUUOXFO', 'FOOQUUOXOF']

Note that the permutations are computed in the same order than the original algorithm, just without the duplicates. Note that 37800 * 2! * 2! * 4! = 3628800, just like you would expect.

Upvotes: 7

Peter de Rivaz
Peter de Rivaz

Reputation: 33509

OPTION 1

One option would be to use 256 bits of storage on the stack to store a bitmask of which characters you had tried in the for loop, and only to recurse for new characters.

OPTION 2

A second option is to use the approach suggested in the comments ( http://n1b-algo.blogspot.com/2009/01/string-permutations.html) and change the for loop to:

else {
    char last=0;
    for(int i = 0; i < range; ++i) {
        if (last==set[begin+i])
            continue;
        last = set[begin+i];
        swap(&set[begin], &set[begin+i]);
        permute(set, begin+1, end);
        swap(&set[begin], &set[begin+i]);
    }
}

However, to use this approach you will also have to sort the characters set[begin],set[begin+1],...set[end-1] at the entry to the function.

Note that you have to sort every time the function is called. (The blog post does not seem to mention this, but otherwise you will generate too many results for an input string "aabbc". The problem is that the string does not stay sorted after swap is used.)

This is still not very efficient. For example, for a string containing 1 'a' and N 'b's this approach will end up calling the sort N times for an overall complexity of N^2logN

OPTION 3

A more efficient approach for long strings containing lots of repeats would be to maintain both the string "set" and a dictionary of how many of each type of character you have left to use. The for loop would change to a loop over the keys of the dictonary as these would be the unique characters that are allowed at that position.

This would have complexity equal to the number of output strings, and only a very small extra amount of storage to hold the dictionary.

Upvotes: 1

user537390
user537390

Reputation: 29

You could add an if statement to prevent the swap code from executing if it would swap two identical characters. The for loop is then

for(int i = 0; i < range; ++i) {
    if(i==0 || set[begin] != set[begin+i]) {
      swap(&set[begin], &set[begin+i]);
      permute(set, begin+1, end);
      swap(&set[begin], &set[begin+i]);
    }
}

The reason for allowing the case i==0 is make sure the recursive call happens exactly once even if all the characters of the set are the same.

Upvotes: 2

Related Questions