Dmitriy
Dmitriy

Reputation: 3435

Correct implementation of Boyer Moore algorithm

I tried to use several implementations, but all of them had bugs.
Search at SO gave me http://www-igm.univ-mlv.fr/~lecroq/string/node14.html - looks nice, but this implementation gave me wrong results - sometimes it doesn't found a string.
I spent couple hours to find the bug.

Following line looks fine:

j += MAX(bmGs[i], bmBc[y[i + j]] - m + 1 + i);

but y is char * and char is signed! It means that y[i + j] could be negative (what happens in one of my tests).

My question is: Where to find correct implementation of Boyer Moore algorithm?

Upvotes: 3

Views: 4631

Answers (3)

TamaMcGlinn
TamaMcGlinn

Reputation: 3248

As of C++17, this is built into the STL. Just use std::boyer_moore_searcher. For example:

#include <algorithm>
#include <string>
#include <functional>

int main()
{
    std::string haystack = "The quick brown fox jumped over the lazy dog";
    std::string needle = "brown";
    const auto s = std::boyer_moore_searcher<std::string::iterator>(needle.begin(), needle.end());
    const auto it = std::search(haystack.begin(), haystack.end(), s);
    return 0;
}

Upvotes: 1

caf
caf

Reputation: 239321

char isn't definitively signed or unsigned - it's unspecified, and left up to the implementation to define.

If the algorithm depends on char being unsigned, then it should explicitly cast the input pointers to unsigned char (which is how the C standard library string handling functions are defined to work - all comparisons are done treating the characters in the string as unsigned char).

Upvotes: 4

In silico
In silico

Reputation: 52207

Have you tried Wikipedia?

Or the PDF written by the inventors of the algorithm?

Upvotes: 4

Related Questions