Riki
Riki

Reputation: 1806

Algorithm to find "ordered combinations"

I need an algorithm to find, what I call, "ordered combinations" (Maybe someone knows the real name for this if there is one). Of course I already tried to come up with an algorithm on my own but I'm really stuck.

How it should work:

Given 2 lists (not sets, order is important here!) of elements that are guaranteed to contain the same elements, all ordered combinations. An ordered combination is a 2-tuple, 3-tuple, ... n-tuple (no limit on N) of elements that appear in the same order in both lists.

I'm not really sure if that makes it clear so here are multiple examples: (List1, List2, Expected Result, Annotation)

ASDF
ADSF
Result: AS, AD, AF, SF, DF, ASF, ADF

Note: ASD is not a valid result because there is no way to have ascending indices in the second list for this combination

ADSD
ASDD
Result: AD, AS, AD, DD, SD, ASD, ADD

Note: AD appears twice because it can be created from indices 1,2 and 1,4 and in the second list 1,3 and 1,4. But it would also be correct if it only appears once. Also D appears twice in both lists in an order, so this allows ADD as a valid combination too.

SDFG
SDFG
Result: SD, SF, SG, DF, DG, FG, SDF, SFG, SDG, DFG, SDFG, 

Note: Same input; all combinations are possible

ABCDEFG
GFEDCBA
Result: <empty>

Note: There are no combinations that appear in the same order in both lists

QWRRRRRRR
WRQ
Result: WR

Note: The only combination that appears in the same order in both sets is WR

Notes:

Upvotes: 2

Views: 273

Answers (1)

David Eisenstat
David Eisenstat

Reputation: 65458

I would describe this problem as enumerating common subsequences of two strings. As a first cut, make a method like this, which chooses the first letter nondeterministically and recurses (Python, sorry).

def commonsubseqs(word1, word2, prefix=''):
    if len(prefix) >= 2:
        print(prefix)
    for letter in set(word1) & set(word2):  # set intersection
        # figure out what's left after consuming the first instance of letter
        remainder1 = word1[word1.index(letter) + 1:]
        remainder2 = word2[word2.index(letter) + 1:]
        # take letter and recurse
        commonsubseqs(remainder1, remainder2, prefix + letter)

If this simple solution is not fast enough for you, then it can be improved as follows. For each pair of suffixes of the two words, we precompute the list of recursive calls. In Python again:

def commonsubseqshelper(table, prefix, i, j):
    if len(prefix) >= 2:
        print(''.join(prefix))
    for (letter, i1, j1) in table[i][j]:
        prefix.append(letter)
        commonsubseqshelper(table, prefix, i1, j1)
        del prefix[-1]  # delete the last item

def commonsubseqs(word1, word2):
    table = [[[(letter, word1.index(letter, i) + 1, word2.index(letter, j) + 1)
               for letter in set(word1[i:]) & set(word2[j:])]
              for j in range(len(word2) + 1)]  # 0..len(word2)
             for i in range(len(word1) + 1)]   # 0..len(word1)
    commonsubseqshelper(table, [], 0, 0)

This polynomial-time preprocessing step improves the speed of enumeration to its asymptotic optimum.

Upvotes: 2

Related Questions