hans glick
hans glick

Reputation: 2611

Find algorithm : Reconstruct a sequence with the minimum length combination of disjointed subsequences chosen from a list of subsequences

I do not know if it’s appropriate to ask this question here so sorry if it is not.

I got a sequence ALPHA, for example :

A B D Z A B X

I got a list of subsequences of ALPHA, for example :

A B D
B D
A B
D Z
A
B
D
Z
X

I search an algorithm that find the minimum length of disjointed subsequences that reconstruct ALPHA, for example in our case :

{A B D} {Z} {A B} {X}

Any ideas? My guess is something already exists.

Upvotes: 1

Views: 130

Answers (1)

danbanica
danbanica

Reputation: 3038

You can transform this problem into finding a minimum path in a graph.

The nodes will correspond to prefixes of the string, including one for the empty string. There will be an edge from a node A to a node B if there is an allowed sub-sequence that, when appended to the string prefix A, the result is the string prefix B.

The question is now transformed into finding the minimum path in the graph starting from the node corresponding to the empty string, and ending in the node corresponding to the entire input string.

You can now apply e.g. BFS (since the edges have uniform costs), or Dijkstra's algorithm to find this path.


The following python code is an implementation based on the principles above:

def reconstruct(seq, subseqs):
    n = len(seq)

    d = dict()
    for subseq in subseqs:
        d[subseq] = True

    # in this solution, the node with value v will correspond
    # to the substring seq[0: v]. Thus node 0 corresponds to the empty string
    # and node n corresponds to the entire string

    # this will keep track of the predecessor for each node
    predecessors = [-1] * (n + 1)
    reached = [False] * (n + 1)
    reached[0] = True

    # initialize the queue and add the first node
    # (the node corresponding to the empty string)
    q = []
    qstart = 0
    q.append(0)

    while True:
        # test if we already found a solution
        if reached[n]:
            break

        # test if the queue is empty
        if qstart > len(q):
            break

        # poll the first value from the queue
        v = q[qstart]
        qstart += 1

        # try appending a subsequence to the current node
        for n2 in range (1, n - v + 1):
            # the destination node was already added into the queue
            if reached[v + n2]:
                continue

            if seq[v: (v + n2)] in d:
                q.append(v + n2)
                predecessors[v + n2] = v
                reached[v + n2] = True

    if not reached[n]:
        return []

    # reconstruct the path, starting from the last node
    pos = n
    solution = []
    while pos > 0:
        solution.append(seq[predecessors[pos]: pos])
        pos = predecessors[pos]
    solution.reverse()

    return solution


print reconstruct("ABDZABX", ["ABD", "BD", "AB", "DZ", "A", "B", "D", "Z", "X"])

I don't have much experience with python, that's the main reason why I preferred to stick to the basics (e.g. implementing a queue with a list + an index to the start).

Upvotes: 2

Related Questions