Michael
Michael

Reputation: 42100

Regular expression matching algorithm in Java

This article says that regexp matching in Java is slow because regexps with "back references" cannot be matched efficiently. The article explains efficient Thomson's NFA-based matching algorithm (invented in 1968) which works for regexps without "back references". However the Pattern javadoc says Java regexps use NFA-based approach.

Now I wonder how efficient Java regexp matching is and what algorithm it uses.

Upvotes: 9

Views: 2876

Answers (1)

Prabhakaran Ramaswamy
Prabhakaran Ramaswamy

Reputation: 26094

java.util.regex.Pattern uses Boyer–Moore string search algorithm

/* Attempts to match a slice in the input using the Boyer-Moore string
 * matching algorithm. The algorithm is based on the idea that the
 * pattern can be shifted farther ahead in the search text if it is
 * matched right to left.
 */

private void compile() {
    ----------------------
    -----------------------

   if (matchRoot instanceof Slice) {
        root = BnM.optimize(matchRoot);
        if (root == matchRoot) {
            root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
        }
    } else if (matchRoot instanceof Begin || matchRoot instanceof First) {
        root = matchRoot;
    } else {
        root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
    }
}

Upvotes: 1

Related Questions