Reputation: 41
I know the general LCS algorithm and can make the table.
But how to find the LCS with a gap constraint as shown in the picture above?
How can I make a table?
An example of LCS with gap constraint:
Upvotes: 2
Views: 968
Reputation: 5126
Longest common subsequence with a gap constraint can be thought of as longest common substring with a gap allowance. So we can modify the standard dynamic programming algorithm from Wikipedia.
I wrote the pseudocode to return just a single best substring/subsequence, but it should be adaptable to track all possible best solutions if required.
Should be equivalent to mcskinner's answer in terms of runtime complexity etc, but may be easier to implement in a lower level language because it doesn't require any data structures or recursion.
function LongestCommonSubstringWithGaps(S[1..r], T[1..n], gs, gt)
// Stores the length of the best solution ending with a match at i,j.
// If there is no match at i,j, stores zero.
L := array(1..r, 1..n)
// Stores the indices of the sub-solution used to construct each L.
Lkl := array(1..r, 1..n) of pairs
// Best solution length.
z := 0
// The indices of the last match of the best solution.
zij := (0, 0)
for i := 1..r
kmin := max(1, i - gs - 1)
for j := 1..n
// Nothing to do if there is no match.
if S[i] != T[j]
// Setting to zero unnecessary if L is already initialized.
L[i, j] := 0
continue
L[i, j] := 1
// Setting to zero unnecessary if Lkl is already initialized.
Lkl[i, j] := (0, 0)
// Searching backwards for possible sub-solutions to build upon.
lmin := max(1, j - gt - 1)
for k := i-1..kmin
for l := j-1..lmin
if L[k, l] + 1 > L[i, j]
L[i, j] := L[k, l] + 1
Lkl[i, j] := (k, l)
// In each row, every non-zero sub-solution can only be
// better than anything else farther down but still
// within the gap window.
// So stopping each row on first non-zero.
if L[k, l] > 0
break
if L[i, j] > z
z := L[i, j]
zij := (i, j)
// Rebuilding the solution by following individual steps backwards.
ret := ''
while zij != (0, 0)
ret := S[zij(1)] + ret
zij := Lkl[zij(1), zij(2)]
return ret
Upvotes: 1
Reputation: 2748
It is possible to design a recursive algorithm to handle this constraint.
By memoizing (i.e. caching) the answers, you get the same big-O efficiency as you would with Dynamic Programming. It will have a larger constant factor due to the use of function calls instead of for loops. But the two are equivalent, and you can think of DP as an "inside out" recursion with memoization.
The basic idea is to create a gap "budget" for each of the sequences, and then account for this when skipping letters. There are 4 cases, which closely track the standard recursive approach to LCS.
k
.Here's some Python code that does what you want. It could be made much more efficient, especially by using a better language than Python, but it should give you an idea.
def gap_lcs(lh, rh, k):
ln, rn = len(lh), len(rh)
memo = {}
choices = {}
def rec(li, ri, l_budget, r_budget):
# At the end of either sequence we are forced to use
# Case 1: terminate the match.
if li >= ln or ri >= rn:
return 0
# Cache results. This limits the complexity to O(ln * lm * k^2).
# Without this the recursion would take exponential time.
key = (li, ri, l_budget, r_budget)
if key in memo:
return memo[key]
# Case 1: terminate the match.
res = 0
choice = (0, 0)
# Case 2: matching characters, extend the sequence.
if lh[li] == rh[ri]:
test = 1 + rec(li + 1, ri + 1, k, k)
if test > res:
res = test
choice = (1, 1)
# Case 3: skip the left character if there's still budget.
if l_budget > 0:
test = rec(li + 1, ri, l_budget - 1, r_budget)
if test > res:
res = test
choice = (1, 0)
# Case 4: skip the right character if there's still budget.
if r_budget > 0:
test = rec(li, ri + 1, l_budget, r_budget - 1)
if test > res:
res = test
choice = (0, 1)
memo[key] = res
choices[key] = choice
return res
# Find the best combination of starting points within the two sequences.
# This is so the gap constraint will not apply to skips at the start.
res = 0
best_li, best_ri = 0, 0
for li in range(ln):
for ri in range(rn):
test = rec(li, ri, k, k)
if test > res:
res, best_li, best_ri = test, li, ri
# Reconstruct the LCS by following the choices we tracked,
# starting from the best start we found.
li, ri = best_li, best_ri
l_budget, r_budget = k, k
path = []
while True:
key = (li, ri, l_budget, r_budget)
# Case 1.
if key not in choices:
break
inc_li, inc_ri = choices[key]
# Case 1.
if inc_li == 0 and inc_ri == 0:
break
if inc_li == 1 and inc_ri == 1:
# Case 2.
l_budget, r_budget = k, k
path.append((lh[li], li, ri))
else:
# Cases 3 and 4.
l_budget -= inc_li
r_budget -= inc_ri
li += inc_li
ri += inc_ri
return path
This produces the desired output for your examples.
gap_lcs('RCLPCRR', 'RPPLCPLRC', 2)
# [('R', 0, 0), ('P', 3, 1), ('C', 4, 4), ('R', 6, 7)]
gap_lcs('RCLPCRR', 'RPPLCPLRC', 1)
# [('C', 1, 4), ('P', 3, 5), ('R', 5, 7)]
gap_lcs('RCLPCRR', 'RPPLCPLRC', 0)
# [('R', 0, 7), ('C', 1, 8)]
Upvotes: 2