Reputation: 3089
I was going through suffix array and its use to compute longest common prefix of two suffixes.
The source says:
"The lcp between two suffixes is the minimum of the lcp's of all pairs of adjacent suffixes between them on the array"
i.e. lcp(x,y)=min{ lcp(x,x+1),lcp(x+1,x+2),.....,lcp(y-1,y) }
where x and y are two index of the string from where the two suffix of the string starts.
I am not convinced with the statement as in example of string "abca"
.
lcp(1,4)=1
(considering 1 based indexing)
but if I apply the above equation then
lcp(1,4)=min{lcp(1,2),lcp(2,3),lcp(3,4)}
and I think lcp(1,2)=0
.
so the answer must be 0
according to the equation.
Am i getting it wrong somewhere?
Upvotes: 0
Views: 903
Reputation: 5191
You can't find LCP of any two suffixes by simply calculating the minimum of the lcp's of all pairs of adjacent suffixes between them on the array.
We can calculate the LCPs of any suffixes (i,j) with the Help of Following :
LCP(suffix i,suffix j)=LCP[RMQ(i + 1; j)]
Also Note (i<j)
as LCP (suff i,suff j)
may not necessarly equal LCP (Suff j,suff i)
.
RMQ is Range Minimum Query .
Page 3 of this paper.
Details:
Step 1: First Calculate LCP of Adjacents /consecutive Suffix Pairs .
n= Length of string.
suffixArray[] is Suffix array.
void calculateadjacentsuffixes(int n)
{
for (int i=0; i<n; ++i) Rank[suffixArray[i]] = i;
Height[0] = 0;
for (int i=0, h=0; i<n; ++i)
{
if (Rank[i] > 0)
{
int j = suffixArray[Rank[i]-1];
while (i + h < n && j + h < n && str[i+h] == str[j+h])
{
h++;
}
Height[Rank[i]] = h;
if (h > 0) h--;
}
}
}
Note: Height[i]=LCPs of (Suffix i-1 ,suffix i) ie. Height array contains LCP of adjacent suffix.
Step 2:
Calculate LCP of Any two suffixes i,j using RMQ concept. RMQ pre-compute function:
void preprocesses(int N)
{
int i, j;
//initialize M for the intervals with length 1
for (i = 0; i < N; i++)
M[i][0] = i;
//compute values from smaller to bigger intervals
for (j = 1; 1 << j <= N; j++)
{
for (i = 0; i + (1 << j) - 1 < N; i++)
{
if (Height[M[i][j - 1]] < Height[M[i + (1 << (j - 1))][j - 1]])
{
M[i][j] = M[i][j - 1];
}
else
{
M[i][j] = M[i + (1 << (j - 1))][j - 1];
}
}
}
}
Step 3: Calculate LCP between any two Suffixes i,j
int LCP(int i,int j)
{
/*Make sure we send i<j always */
/* By doing this ,it resolve following
suppose ,we send LCP(5,4) then it converts it to LCP(4,5)
*/
if(i>j)
swap(i,j);
/*conformation over*/
if(i==j)
{
return (Length_of_str-suffixArray[i]);
}
else
{
return Height[RMQ(i+1,j)];
//LCP(suffix i,suffix j)=LCPadj[RMQ(i + 1; j)]
//LCPadj=LCP of adjacent suffix =Height.
}
}
Where RMQ function is:
int RMQ(int i,int j)
{
int k=log((double)(j-i+1))/log((double)2);
int vv= j-(1<<k)+1 ;
if(Height[M[i][k]]<=Height[ M[vv][ k] ])
return M[i][k];
else
return M[ vv ][ k];
}
Refer Topcoder tutorials for RMQ.
You can check the complete implementation in C++ at my blog.
Upvotes: 0
Reputation: 2618
I think the index referred by the source is not the index of the string itself, but index of the sorted suffixes.
a
abca
bca
ca
Hence
lcp(1,2) = lcp(a, abca) = 1
lcp(1,4) = min(lcp(1,2), lcp(2,3), lcp(3,4)) = 0
Upvotes: 2