Reputation:
Here is a pseudo code for computing the border array in KMP.
p is the pattern
border[1]:=-1
i:=border[1]
for j=2,...,m
while i >= 0 and p[i+1] != p[j-1] do i = border[i+1]
i++
border[j]:=i
I can execute the following pseudo code to compute the border array but the problem I am having right now is that I don't really understand the border array meaning how to interpret it.
For instance if the pattern does not equal at position (i+1) and (j-1) the variable i is set to border[i+1]. Why is that for example?
I realized the missing understanding when I tried to answer the question that three consecutive entries in a border array cannot differ by one from its predecessor. E.g. border[10]=3, border[11]=2, border[12]=1
I would appreciate a good explanation in order to get a better understanding.
Upvotes: 2
Views: 2453
Reputation: 11
OP 1."I don't really understand the border array meaning how to interpret it."
In this case, border means prefix and suffix of string.
Border value means longest prefix == suffix
.
Gassa example - brackets []
mean [prefix]==[suffix]
why does border[3]=1?
column: [0]12[3]
string: [a]bc[a]
why does border[4]=2?
column: [01]2[34]
string: [ab]c[ab]
why does border[5]=1?
column: [0]1234[5]
string: [a]bcab[a]
OP 2."For instance if the pattern does not equal at position (i+1) and (j-1) the variable i is set to border[i+1]. Why is that for example?"
The comparison is checking if the last prefix/suffix character match
for example: (p[i+1] != p[j-1])
if they don't match, reduce comparison size of prefix/suffix from border[j-1]+1 to border[i+1]
for example: i = border[i+1]
OP 3."question that three consecutive entries in a border array cannot differ by one from its predecessor. E.g. border[10]=3, border[11]=2, border[12]=1"
When size of string increases by 1, two things can happen
Size of longest prefix/suffix can only increase by 1 or is less than that
In other words, border[i+1]<=border[i]+1
also, did you mean border array values canNOT differ more than one from its predecessor?
because here is an example of predecessor CAN differ more than one
column: 01234
string: babaa
border: 00120
as you can see border[3]=2
and border[4]=0
so they differ by 2 not 1
sources:
I learn that border value means longest prefix == suffix
from gassa link stackoverflow
Upvotes: 1
Reputation: 8846
What you call the border array is the prefix function. There are many explanations, see Stackoverflow, Wikipedia, or just google an explanation more suitable for you.
As for the second part of your question, the following string is an example for the property you ask for:
column: 0123456
string: abcabac
border: 0001210
Here, border[4] = 2
because ab = ab
, border[5] = 1
because a = a
, and border[6] = 0
.
Whether all three values can be non-zero (for example, 3, 2, 1) is an interesting question.
Upvotes: 0