Reputation: 4317
I was asked to learn about KMP DFA, and what I've found in my book is that implementation, but our lecturer calls something "the prefix function" all the time. I really can't understand which part is this function here, can someone explain it to me ? I'm sorry if that was asked somewhere, but I couldn't find it.
public class KMP {
private String pat;
private String t;
private int[][] fsm;
public static final int ALPHABET = 256;
public KMP(String pat) {
this.pat = pat;
char[] pattern = pat.toCharArray();
int M = pattern.length;
fsm = new int[ALPHABET][pattern.length];
fsm[pattern[0]][0] = 1;
for(int X = 0, j = 1; j < M; j++) {
for(int c = 0; c < ALPHABET; c++) {
fsm[c][j] = fsm[c][X];
}
fsm[pattern[j]][j] = j + 1;
X = fsm[pattern[j]][X];
}
display(fsm);
}
public void search(String t) {
char[] text = t.toCharArray();
this.t = t;
int N = text.length;
int M = pat.length();
int i, j;
for(i = 0, j = 0; i < N; i++) {
j = fsm[t.charAt(i)][j];
if(j == M) {
System.out.println("Found at " + (i - M + 1));
j = 0;
}
}
}
Upvotes: 0
Views: 1059
Reputation: 1260
The KMP algorithm does not construct a DFA. What you have implemented is looks more like a DFA, which recognizes some string pattern
.
The idea behind KMP algorithm is to construct the so called prefix function for the given pattern
. And what is this function? It's definition is that for each position i
of the string we are interested in the length of the longest suffix of pattern[1..i]
, which is also a prefix of the pattern
string (0-indexed). This may sound confusing, but here is an example:
The prefix function of pattern = "abacabacada"
is pf[] = 0 0 1 0 1 2 3 4 5 0 1
. pf[8]
is equal to 5, because the longest suffix of "bacabaca", that is also a prefix of "abacabacada" is "abaca", which has length 5. Analogically, pf[9] = 0
because there is no suffix of bacabacad
which is also a prefix of abacabacada
(the pattern).
I hope that this explanation makes the prefix function clearer. Some friends call the array, storing the prefix function fl
, short for "fail link" because while doing the matching, we use the values in this array only when the characters from text
and pattern
mismatch.
Here is a clear implementation of the algorithm (in Java).
Upvotes: 1