ashur
ashur

Reputation: 4317

KMP DFA prefix function

I was asked to learn about KMP DFA, and what I've found in my book is that implementation, but our lecturer calls something "the prefix function" all the time. I really can't understand which part is this function here, can someone explain it to me ? I'm sorry if that was asked somewhere, but I couldn't find it.

public class KMP {
private String pat;
private String t;
private int[][] fsm;

public static final int ALPHABET = 256;

public KMP(String pat) {
    this.pat = pat;
    char[] pattern = pat.toCharArray();

    int M = pattern.length;

    fsm = new int[ALPHABET][pattern.length];
    fsm[pattern[0]][0] = 1;

    for(int X = 0, j = 1; j < M; j++) {

        for(int c = 0; c < ALPHABET; c++) {
            fsm[c][j] = fsm[c][X];
        }
        fsm[pattern[j]][j] = j + 1;
        X = fsm[pattern[j]][X];
    }
    display(fsm);
}

public void search(String t) {
    char[] text = t.toCharArray();
    this.t = t;
    int N = text.length;
    int M = pat.length();

    int i, j;
    for(i = 0, j = 0; i < N; i++) {
        j = fsm[t.charAt(i)][j];
        if(j == M) {
            System.out.println("Found at " + (i - M + 1));
            j = 0;
        }
    }
}

Upvotes: 0

Views: 1059

Answers (1)

yasen
yasen

Reputation: 1260

The KMP algorithm does not construct a DFA. What you have implemented is looks more like a DFA, which recognizes some string pattern.

The idea behind KMP algorithm is to construct the so called prefix function for the given pattern. And what is this function? It's definition is that for each position i of the string we are interested in the length of the longest suffix of pattern[1..i], which is also a prefix of the pattern string (0-indexed). This may sound confusing, but here is an example:

The prefix function of pattern = "abacabacada" is pf[] = 0 0 1 0 1 2 3 4 5 0 1. pf[8] is equal to 5, because the longest suffix of "bacabaca", that is also a prefix of "abacabacada" is "abaca", which has length 5. Analogically, pf[9] = 0 because there is no suffix of bacabacad which is also a prefix of abacabacada (the pattern).

I hope that this explanation makes the prefix function clearer. Some friends call the array, storing the prefix function fl, short for "fail link" because while doing the matching, we use the values in this array only when the characters from text and pattern mismatch.

Here is a clear implementation of the algorithm (in Java).

Upvotes: 1

Related Questions