Christian
Christian

Reputation: 169

Boyer-Moore-Algorithm: to pass a character

In the original abstract of the Boyer-Moore-algorithm the phrase to pass a character often occurs. What does this phrase mean exactly? I am not a native english speaker and google results are not helpful.

I am asking this question on this board, because the context might seem important.

Upvotes: 0

Views: 191

Answers (3)

Frank Bryce
Frank Bryce

Reputation: 8446

"Pass a character" in this context is closer to "skipping" a character, or not inspecting it.

The following pseudo-code "passes" odd-indexed characters in the char array

char[] charArray = ['a','b','c', ...];

// increment by 2 to "pass" every other character
for (int i=0; i<charArray.Length; i+=2)
{
    Print(charArray[i]);
}

EDIT: while the above is true in many contexts, in the context of the link you provided this is NOT how it is used (as was pointed out to me by @KevinKirkpatrick)

In brief, a "passed character" in the context of the link you provided means "any character that will not be inspected in the future".

In other words, if you are looping through an array by incrementing (and not decrementing) an index then "passed" characters are all characters that are at a lower index than your current index. This means "passing a character" at index i means incrementing your index to be greater than i.

Upvotes: 0

KevinKirkpatrick
KevinKirkpatrick

Reputation: 1456

The algorithm has two loops: an outer loop over potential matching substrings and an inner loop over the characters of each potential matching substring.

Consider searching for "BIG" in string "FROGBIG".

The outer loop iterates over potential locations that "BIG" might occur - in FROGBIG, there are (at most) 5 such locations.

12345
FROGBIG
BIG     (outer loop index=1)
  BIG   (outer loop index=3)

For each comparison, the inner loop iterates over the characters of the pattern. So for outer loop index of 1, the inner loop will compare "BIG" and FRO" iterating over the string BIG, backwards, one character at a time (at most, 3 comparison).

123
ORF
GIB
G   (inner loop index = 1)
  B (inner loop index = 3)

The algorithm (which I won't get into) is optimizing these loops so as to avoid needless comparisons. That is, during the comparison of FRO and BIG (outer loop 1), the inner loop will only compare "G" vs "O" (inner loop 1), not "R" vs "I" (inner loop 2) or "B" vs "F" (inner loop 3). Furthermore, after comparing "FRO" to "BIG", the outer loop will skip over ROG vs BIG (outer loop 2) and OGB vs BIG (outer loop 3), moving immediately to GBI vs BIG (outer loop 4).

It is in discussing the inner loop (over "G", "I", and "B") that the authors first make use of the word "pass". When the authors say "pass a character", "character" is referring to individual characters "G", "I", and "B" over which the inner loop iterates. The word "pass" is used in the same way I might give street directions: "You will pass Oak Street and pass Maple Street to get to Olive Street". But instead of passing streets, the authors are describing "passing characters": "You will pass G and pass I to get to B".

To further clarify, the authors also frequently use the word "pass" as the adjective "passed". So, they often refer to a "passed character". In this case, "passed characters" are just the characters that the inner loop had to check (or "pass by") to get to the current character. In my street directions, once the traveler reaches Olive Street, one would call Oak Street and Maple Street "passed streets". In the algorithm, if the inner loop is at "I", then "G" is a "passed character"; if it is at "B" then the "passed characters" refers to both "G" and "I".

Finally, the authors continue using the convention "passed characters" in analyzing the overall efficiency. From the paper, "By dividing the number of references to string by the number of characters i - 1 passed before the pattern was found". Here, I believe they're referring to the number of characters in the search string which had to be passed over before the pattern was found. If I'm not mistaken, this seems a bit oblique: the "number of characters passed" is really just "the position of the first occurrence of the pattern in the string". Thus, in fig 1, efficiency is really measured as, "for a pattern of a certain length (3 for "BIG"), what % of the potential matching strings (5 for "FROGBIG": FRO, ROG, OGB, GBI, and BIG) did the algorithm check?"

Upvotes: 1

gilgamash
gilgamash

Reputation: 902

your link already rephrases the "pass more characters than it examines" as follows:

Thus, the algorithm has the unusual property that, in most cases, not all of the first "i" characters of "string" are inspected.

I.O.W.: Not all characters passed to the algorithm are necessarily inspected.

Hope that helps!

Upvotes: 0

Related Questions