Akashdeep Saluja
Akashdeep Saluja

Reputation: 3089

Longest Common Prefix property

I was going through suffix array and its use to compute longest common prefix of two suffixes.

The source says:

"The lcp between two suffixes is the minimum of the lcp's of all pairs of adjacent suffixes between them on the array"

i.e. lcp(x,y)=min{ lcp(x,x+1),lcp(x+1,x+2),.....,lcp(y-1,y) } where x and y are two index of the string from where the two suffix of the string starts.

I am not convinced with the statement as in example of string "abca".

lcp(1,4)=1 (considering 1 based indexing)

but if I apply the above equation then

lcp(1,4)=min{lcp(1,2),lcp(2,3),lcp(3,4)}

and I think lcp(1,2)=0.

so the answer must be 0 according to the equation.

Am i getting it wrong somewhere?

Upvotes: 0

Views: 903

Answers (2)

Ritesh Kumar Gupta
Ritesh Kumar Gupta

Reputation: 5191

You can't find LCP of any two suffixes by simply calculating the minimum of the lcp's of all pairs of adjacent suffixes between them on the array.

We can calculate the LCPs of any suffixes (i,j) with the Help of Following :

LCP(suffix i,suffix j)=LCP[RMQ(i + 1; j)]  

Also Note (i<j) as LCP (suff i,suff j) may not necessarly equal LCP (Suff j,suff i). RMQ is Range Minimum Query .

Page 3 of this paper.

Details:

Step 1: First Calculate LCP of Adjacents /consecutive Suffix Pairs .

n= Length of string.

suffixArray[] is Suffix array.

void calculateadjacentsuffixes(int n)
{
    for (int i=0; i<n; ++i) Rank[suffixArray[i]] = i;
    Height[0] = 0;
    for (int i=0, h=0; i<n; ++i)
    {
        if (Rank[i] > 0)
        {
            int j = suffixArray[Rank[i]-1];
            while (i + h < n && j + h < n && str[i+h] == str[j+h])
            {
                h++;
            }
            Height[Rank[i]] = h;
            if (h > 0) h--;
        }
    }
}

Note: Height[i]=LCPs of (Suffix i-1 ,suffix i) ie. Height array contains LCP of adjacent suffix.

Step 2:

Calculate LCP of Any two suffixes i,j using RMQ concept. RMQ pre-compute function:

void preprocesses(int N)
{
    int i, j;

    //initialize M for the intervals with length 1
    for (i = 0; i < N; i++)
        M[i][0] = i;

    //compute values from smaller to bigger intervals
    for (j = 1; 1 << j <= N; j++)
    {
        for (i = 0; i + (1 << j) - 1 < N; i++)
        {
            if (Height[M[i][j - 1]] < Height[M[i + (1 << (j - 1))][j - 1]])
            {
                M[i][j] = M[i][j - 1];
            }
            else
            {
                M[i][j] = M[i + (1 << (j - 1))][j - 1];
            }
        }
    }
}  

Step 3: Calculate LCP between any two Suffixes i,j

int LCP(int i,int j)
{
    /*Make sure we send i<j always */
    /* By doing this ,it resolve following
    suppose ,we send LCP(5,4) then it converts it to LCP(4,5)
    */
    if(i>j)
        swap(i,j);

    /*conformation over*/

    if(i==j)
    {
        return (Length_of_str-suffixArray[i]);
    }
    else
    {
        return Height[RMQ(i+1,j)];
        //LCP(suffix i,suffix j)=LCPadj[RMQ(i + 1; j)] 
        //LCPadj=LCP of adjacent suffix =Height.
    }
}

Where RMQ function is:

int RMQ(int i,int j)
{
    int k=log((double)(j-i+1))/log((double)2);
    int vv= j-(1<<k)+1 ;
    if(Height[M[i][k]]<=Height[ M[vv][ k] ])
        return M[i][k];
    else
        return M[ vv ][ k];
}

Refer Topcoder tutorials for RMQ.

You can check the complete implementation in C++ at my blog.

Upvotes: 0

marcadian
marcadian

Reputation: 2618

I think the index referred by the source is not the index of the string itself, but index of the sorted suffixes.

a
abca
bca
ca

Hence

lcp(1,2) = lcp(a, abca) = 1
lcp(1,4) = min(lcp(1,2), lcp(2,3), lcp(3,4)) = 0

Upvotes: 2

Related Questions