shad0w.123
shad0w.123

Reputation: 21

Itertools - Merge two list to get all possible combinations

I have two lists: a and b.

a is a list which contains three or more string, while b is a list of separators.

I need to generate all the possible combinations of a and then "merge" the result with all possible combination of b (See the example for better understanding).

I ended up using this code:

from itertools import permutations, combinations, product

a = ["filename", "timestamp", "custom"]
b = ["_", "-", ".", ""]

output = []

for com in combinations(b, len(a) - 1):
    for per in product(com, repeat=len(a) - 1):
        for ear_per in permutations(a):
            out = ''.join(map(''.join, zip(list(ear_per[:-1]), per))) + list(ear_per)[-1]
            output.append(out)

# For some reason the algorithm is generating duplicates
output = list(dict.fromkeys(output))

for o in output:
    print o

This is a sample of the output (which is correct, it is what I need in this case):

timestamp.customfilename
filenamecustom.timestamp
custom_filenametimestamp
timestamp_custom_filename
timestamp-filename.custom
custom_filename-timestamp
filename.timestamp-custom
. . .
filename.custom.timestamp
filename-customtimestamp
custom-timestamp_filename
filename_custom-timestamp
filename.timestampcustom
timestampcustom-filename
custom-timestamp.filename
filenamecustom_timestamp
timestamp.custom_filename
custom.timestampfilename
timestampfilename.custom
customfilename_timestamp
filenametimestamp-custom
custom-filenametimestamp
timestampfilename-custom
timestamp-custom-filename
custom.filenametimestamp
customfilenametimestamp
timestampfilename_custom
custom_filename.timestamp
custom-timestamp-filename
custom-timestampfilename
filename_timestamp.custom
. . .
filename.custom-timestamp
timestamp_filenamecustom
custom_timestampfilename
timestamp.custom.filename
timestamp.filename-custom
filename-custom-timestamp
customfilename.timestamp
filename_timestamp_custom
timestamp_filename.custom
customtimestampfilename
filenamecustomtimestamp
custom.timestamp_filename
filename_customtimestamp
. . .
timestamp-customfilename
filename_custom.timestamp

There are two main problems with this algorithm:

  1. It generates some duplicated lines, so I always need to delete them (slow on bigger sets of data)

  2. if len(a) > len(b) + 2 the script won't start. I that case I would need to repeat the separator to cover len(a) - 1 available spaces between words contained in a.

Upvotes: 2

Views: 410

Answers (2)

Marco Luzzara
Marco Luzzara

Reputation: 6036

You may be looking for this:

a = ["filename", "timestamp", "custom"]
b = ["_", "-", ".", ""]
count = 0

def print_sequence(sol_words, sol_seps):
  global count 
  print("".join([sol_words[i] + sep for (i, sep) in enumerate(sol_seps)] + [sol_words[-1]]))
  count += 1

def backtrack_seps(sol_words, seps, sol_seps, i):
  for (si, sep) in enumerate(seps):
    sol_seps[i] = sep

    if i == len(sol_words) - 2:
      print_sequence(sol_words, sol_seps)
    else:
      backtrack_seps(sol_words, seps, sol_seps, i + 1)

def bt_for_sep(sol_words, seps):
  backtrack_seps(sol_words, seps, [''] * (len(sol_words) - 1), 0)

def backtrack_words(active, words, seps, sol_words, i):
  for (wi, word) in enumerate(words):
    if active[wi]:
      sol_words[i] = word
      active[wi] = False

      if i == len(words) - 1:
        bt_for_sep(sol_words, seps)
      else:
        backtrack_words(active, words, seps, sol_words, i + 1)

      active[wi] = True

backtrack_words([True] * len(a), set(a), set(b), [''] * len(a), 0)
print(count) #96

Usually when you need to enumerate all the possibilities of a certain set of values, backtracking is the way. The scheme for backtracking is always the same, and it is repeated for separators after using it to permute words.


EDIT

The second part of the problem, described as finding the combinations of separators, is actually the problem of finding all the dispositions with repetitions. Doing this was simpler than I thought: after you choose a separator from seps, instead of removing it (or disable it in this case), you just keep it in.

Upvotes: 0

Chris Charley
Chris Charley

Reputation: 6573

This may be a possible solution. It takes the permutations of a, (3*2 = 6), interleaved with the product of b (2 at a time here, 4*4 == 16), to get a total of 6 * 16 == 96 results.

from itertools import permutations, chain, zip_longest, product

a = ["filename", "timestamp", "custom"]
b = ["_", "-", ".", ""]

i=0
for perm in permutations(a):
    for prod in product(b,repeat=len(a)-1):
        tpls = list(chain.from_iterable(zip_longest(perm, prod, fillvalue='')))
        print(''.join(tpls))
        i += 1
print(i)

Upvotes: 1

Related Questions