Frank Noam
Frank Noam

Reputation: 111

Python long permutations with repeated characters

I'm trying to generate all the possible permutations of a string like "0000011111" or "000 11 2222 333". I tried using permutations from itertools on "0000011111" like so:

from itertools import permutations

basestring = "0"*5 +"1"*5 
perms = [''.join(p) for p in permutations(basestring)]
print(len(perms), perms)
print(len(set(perms)), set(perms))

But the list perms had 3 million entries when there are only 10 C 5 = 252 permutations.

Is there a built in tool I can use that is better at handling permutations of strings with many repeated characters?


Otherwise how is this algorithm for generating the permutations (for "0000 1111 222")?

Start with 2 characters        "0000 1111"
Move right most 0 over one     "0001 0111" and add it to the list
Continue moving it to the end  "0001 1011" -> "0001 1101" -> "0001 1110"

Now move the next 0 over one   "0010 0111" -> "0010 1011"
...
Until you get to "1111 0000".

Then for each of the strings generated, repeat the process with 2's.
222 xxxx xxxx -> 22x 2xxx xxxx -> 22x x2xx xxx...

Or am I better off just doing set(perms) to get rid of the repeats? (I need to permute 20 character lists with 3-5 characters where itertools permutations would give me 10e18 strings)


I've been programming casually for 3 years, but only know as much as someone with 1 semester of a programming class.

Upvotes: 5

Views: 1138

Answers (2)

גלעד ברקן
גלעד ברקן

Reputation: 23945

I'm not sure how efficient this is but you could try something like:

map = ['0','1','2','3']
perms = []

def multisetPerms(s,result):
  if all(v is 0 for v in s):
    perms.append(result)
    return
  for i in xrange(len(s)):
    if s[i] > 0:
      _s = s[:]
      _s[i] = _s[i] - 1
      multisetPerms(_s,result + map[i])

multisetPerms([3,2,4],'') # 9!/(3!*2!*4!) = 1260
print (len(perms), perms[0:10]) 

Output:

(1260, ['000112222', '000121222', '000122122', '000122212', '000122221'
      , '000211222', '000212122', '000212212', '000212221', '000221122'])

Upvotes: 0

paisanco
paisanco

Reputation: 4164

Firstly let's look at your first example.

from itertools import permutations
basestring = "0"*5 +"1"*5

This gives basestring = [0000011111]

Calling permutations(basestring) without any argument will give you all permutations of length n of the n position string, which is just n! That's indeed a large number for n=10. Is that really what you want?

Next, if you are looking for the permutations of this string of length 5, you need to specify that length 5 in the call to itertools.permutations.

perms = [''.join(p) for p in permutations(basestring,5)]

This will return all permutations of length 5 of all characters in basestring by position, not value. So you will get some duplicates.

As documented in the itertools.permutations documentation see Python 2 version here, the number of permutations of length r on a string of length n returned by that function will be

n!/(n-r)! or in this case 30240 for n=10, r=5.

If you want to remove the duplicates, you can take

set(perms)

The number of combinations returned by this will be len(set(perms)) = 2^5 or 32. This is the number of strings of length k that can be formed from an "alphabet" of length n, which is n^k. The "alphabet" is the unique characters in your basestring - there are 2 of them (0 and 1) so you can form 32 unique strings of length 5 from that.

Upvotes: 0

Related Questions