Reputation: 2611
I do not know if it’s appropriate to ask this question here so sorry if it is not.
I got a sequence ALPHA, for example :
A B D Z A B X
I got a list of subsequences of ALPHA, for example :
A B D
B D
A B
D Z
A
B
D
Z
X
I search an algorithm that find the minimum length of disjointed subsequences that reconstruct ALPHA, for example in our case :
{A B D} {Z} {A B} {X}
Any ideas? My guess is something already exists.
Upvotes: 1
Views: 130
Reputation: 3038
You can transform this problem into finding a minimum path in a graph.
The nodes will correspond to prefixes of the string, including one for the empty string. There will be an edge from a node A to a node B if there is an allowed sub-sequence that, when appended to the string prefix A, the result is the string prefix B.
The question is now transformed into finding the minimum path in the graph starting from the node corresponding to the empty string, and ending in the node corresponding to the entire input string.
You can now apply e.g. BFS (since the edges have uniform costs), or Dijkstra's algorithm to find this path.
The following python code is an implementation based on the principles above:
def reconstruct(seq, subseqs):
n = len(seq)
d = dict()
for subseq in subseqs:
d[subseq] = True
# in this solution, the node with value v will correspond
# to the substring seq[0: v]. Thus node 0 corresponds to the empty string
# and node n corresponds to the entire string
# this will keep track of the predecessor for each node
predecessors = [-1] * (n + 1)
reached = [False] * (n + 1)
reached[0] = True
# initialize the queue and add the first node
# (the node corresponding to the empty string)
q = []
qstart = 0
q.append(0)
while True:
# test if we already found a solution
if reached[n]:
break
# test if the queue is empty
if qstart > len(q):
break
# poll the first value from the queue
v = q[qstart]
qstart += 1
# try appending a subsequence to the current node
for n2 in range (1, n - v + 1):
# the destination node was already added into the queue
if reached[v + n2]:
continue
if seq[v: (v + n2)] in d:
q.append(v + n2)
predecessors[v + n2] = v
reached[v + n2] = True
if not reached[n]:
return []
# reconstruct the path, starting from the last node
pos = n
solution = []
while pos > 0:
solution.append(seq[predecessors[pos]: pos])
pos = predecessors[pos]
solution.reverse()
return solution
print reconstruct("ABDZABX", ["ABD", "BD", "AB", "DZ", "A", "B", "D", "Z", "X"])
I don't have much experience with python, that's the main reason why I preferred to stick to the basics (e.g. implementing a queue with a list + an index to the start).
Upvotes: 2