Computing the second (mis-match) table in the Boyer-Moore String Search Algorithm

Question

For the Boyer-Moore algorithm to be worst-case linear, the computation of the mis-match table must be O(m). However, a naive implementation would loop through all suffixs O(m) and all positions in that that suffix could go and check for equality... which is O(m³)!

Below is the naive implementation of table building algorithm. So this question becomes: How can I improve this algorithm's runtime to O(m)?

def find(s, sub, no):
    n = len(s)
    m = len(sub)

    for i in range(n, 0, -1):
        if s[max(i-m, 0): i] == sub[max(0, m-i):] and \
            (i-m < 1 or s[i-m-1] != no):
            return n-i

    return n

def table(s):
    m = len(s)
    b = [0]*m

    for i in range(m):
        b[i] = find(s, s[m-i:], s[m-i-1])

    return b

print(table('anpanman'))

To put minds at rest, this isn't homework. I'll add revisions when anyone posts ideas for improvements.

moinudin · Accepted Answer

The code under "Preprocessing for the good-suffix heuristics" on this page builds the good-suffix table in O(n) time. It also explains how the code works.

Computing the second (mis-match) table in the Boyer-Moore String Search Algorithm

Answers (1)

Related Questions