Reputation: 7895
I'm trying to implement the longest common substring algorithm in C, and after reading the post below, i'm really confused about the following part:
Now, the greatest value is LCP[2]=3, but it is for SA[1] and SA[2], both of which start in the string A. So, we ignore that. On the other hand, LCP[4]=2 is for SA[3] (corresponds to the suffix bc of B) and SA[4] (corresponding to suffix bcabc#bc of A). So, this is the longest common substring.
My LCP result is also different from the post example.
[#bc, abc#bc, abcabc#bc, bc, bc#bc, bcabc#bc, c, c#bc, cabc#bc]
[-1, 0, 3, 0, 2, 2, 0, 1, 1]
OR removing the first element
[abc#bc, abcabc#bc, bc, bc#bc, bcabc#bc, c, c#bc, cabc#bc]
[0, 3, 0, 2, 2, 0, 1, 1]
I see that SA[3] corresponds to bc, but SA[4] corresponds (i guess) to #bcbc. So, that's what confuses me.
Anyone can clarify on this? Thanks!
Upvotes: 1
Views: 614
Reputation: 16406
I see that SA[3] corresponds to bc, but SA[4] corresponds (i guess) to #bcbc.
There is no #bcbc to be found anywhere above. The quote says "SA[4] (corresponding to suffix bcabc#bc of A)" and that's certainly true:
0 1 2 3 4 5 6 7 index
[abc#bc, abcabc#bc, bc, bc#bc, bcabc#bc, c, c#bc, cabc#bc] Suffix Array
[0, 3, 0, 2, 2, 0, 1, 1] LCP
SA[2] is a suffix of B and SA[3] is a suffix of A(#B), so the longest common substring is bc, of length 2.
Upvotes: 1