PythonPower
PythonPower

Reputation:

Computing the second (mis-match) table in the Boyer-Moore String Search Algorithm

For the Boyer-Moore algorithm to be worst-case linear, the computation of the mis-match table must be O(m). However, a naive implementation would loop through all suffixs O(m) and all positions in that that suffix could go and check for equality... which is O(m3)!

Below is the naive implementation of table building algorithm. So this question becomes: How can I improve this algorithm's runtime to O(m)?

def find(s, sub, no):
    n = len(s)
    m = len(sub)

    for i in range(n, 0, -1):
        if s[max(i-m, 0): i] == sub[max(0, m-i):] and \
            (i-m < 1 or s[i-m-1] != no):
            return n-i

    return n

def table(s):
    m = len(s)
    b = [0]*m

    for i in range(m):
        b[i] = find(s, s[m-i:], s[m-i-1])

    return b

print(table('anpanman'))

To put minds at rest, this isn't homework. I'll add revisions when anyone posts ideas for improvements.

Upvotes: 4

Views: 1874

Answers (1)

moinudin
moinudin

Reputation: 138487

The code under "Preprocessing for the good-suffix heuristics" on this page builds the good-suffix table in O(n) time. It also explains how the code works.

Upvotes: 1

Related Questions