Reputation: 619
I have been working on a fast, efficient way to solve the following problem, but as of yet, I have only been able to solve it using a rather slow, nest-loop solution. Anyways, here is the description:
So I have a string of length L, lets say 'BBBX'
. I want to find all possible strings of length L, starting from 'BBBX'
, that differ at, at most, 2 positions and, at least, 0 positions. On top of that, when building the new strings, new characters must be selected from a specific alphabet.
I guess the size of the alphabet doesn't matter, so lets say in this case the alphabet is ['B', 'G', 'C', 'X']
.
So, some sample output would be, 'BGBG'
, 'BGBC'
, 'BBGX'
, etc. For this example with a string of length 4 with up to 2 substitutions, my algorithm finds 67 possible new strings.
I have been trying to use itertools
to solve this problem, but I am having a bit of difficulty finding a solution. I try to use itertools.combinations(range(4), 2)
to find all the possible positions. I am then thinking of using product()
from itertools
to build all of the possibilities, but I am not sure if there is a way I could connect it somehow to the indices from the output of combinations()
.
Upvotes: 4
Views: 249
Reputation: 70715
Not tested much, but it does find 67 for the example you gave. The easy way to connect the indices to the products is via zip()
:
def sub(s, alphabet, minsubs, maxsubs):
from itertools import combinations, product
origs = list(s)
alphabet = set(alphabet)
for nsubs in range(minsubs, maxsubs + 1):
for ix in combinations(range(len(s)), nsubs):
prods = [alphabet - set(origs[i]) for i in ix]
s = origs[:]
for newchars in product(*prods):
for i, char in zip(ix, newchars):
s[i] = char
yield "".join(s)
count = 0
for s in sub('BBBX', 'BGCX', 0, 2):
count += 1
print s
print count
Note: the major difference from FogleBird's is that I posted first - LOL ;-) The algorithms are very similar. Mine constructs the inputs to product()
so that no substitution of a letter for itself is ever attempted; FogleBird's allows "identity" substitutions, but counts how many valid substitutions are made and then throws the result away if any identity substitutions occurred. On longer words and a larger number of substitutions, that can be much slower (potentially the difference between len(alphabet)**nsubs
and (len(alphabet)-1)**nsubs
times around the ... in product():
loop).
Upvotes: 1
Reputation: 76822
Here's my solution.
The first for loop tells us how many replacements we will perform. (0, 1 or 2 - we go through each)
The second loop tells us which letters we will change (by their indexes).
The third loop goes through all of the possible letter changes for those indexes. There's some logic to make sure we actually change the letter (changing "C" to "C" doesn't count).
import itertools
def generate_replacements(lo, hi, alphabet, text):
for count in range(lo, hi + 1):
for indexes in itertools.combinations(range(len(text)), count):
for letters in itertools.product(alphabet, repeat=count):
new_text = list(text)
actual_count = 0
for index, letter in zip(indexes, letters):
if new_text[index] == letter:
continue
new_text[index] = letter
actual_count += 1
if actual_count == count:
yield ''.join(new_text)
for text in generate_replacements(0, 2, 'BGCX', 'BBBX'):
print text
Here's its output:
BBBX GBBX CBBX XBBX BGBX BCBX BXBX BBGX BBCX BBXX BBBB BBBG BBBC GGBX
GCBX GXBX CGBX CCBX CXBX XGBX XCBX XXBX GBGX GBCX GBXX CBGX CBCX CBXX
XBGX XBCX XBXX GBBB GBBG GBBC CBBB CBBG CBBC XBBB XBBG XBBC BGGX BGCX
BGXX BCGX BCCX BCXX BXGX BXCX BXXX BGBB BGBG BGBC BCBB BCBG BCBC BXBB
BXBG BXBC BBGB BBGG BBGC BBCB BBCG BBCC BBXB BBXG BBXC
Upvotes: 2