Reputation: 8159

Whole word search with multiple matches and non-word characters

I'm trying to match all instances of a whole word, along with leetspeak alternatives, in a string. For example, let's take this string:

brΦwn 8rown The quiçκ brΦwη fox βrown b®øwΠ brownie

I'm trying to capture the 5 "brown" instances, but not the "brownie". I have the following regex to match this:

/\b(b|b\.|b_|b\-|8|\|3|ß|Β|β)(r|r\.|r_|r\-|®)(o|o\.|o_|o\-|0|Ο|ο|Φ|¤|°|ø)(w|w\.|w_|w\-|ω|ψ|Ψ)(n|n\.|n_|n\-|η|Ν|Π)\b/i

The issue seems to be that \b matches the first non-word character and because characters like Π are non-word characters, that gets matched and not the space afterward. Targeting white space using \s doesn't work for consecutive words (it would only match the first "brown" in "the quick brown brown fox").

Any suggestions for how to make this work?

A Regex101 setup: https://regex101.com/r/LKo9Xf/4

Upvotes: 0

Answers (2)

Emma

Reputation: 27743

Maybe, we could use two lookarounds instead of word boundaries, such as:

(?<!\S) and (?!\S) are functionally better whitespace boundary's, and much quicker.

by sln, or:

(?<=^|\s)(b|b\.|b_|b\-|8|\|3|ß|Β|β)(r|r\.|r_|r\-|®)(o|o\.|o_|o\-|0|Ο|ο|Φ|¤|°|ø)(w|w\.|w_|w\-|ω|ψ|Ψ)(n|n\.|n_|n\-|η|Ν|Π)(?=\s|$)

Demo

Upvotes: 0

Bee

Reputation: 1306

Does this do the trick?

[b|b\.|b_|b\-|8|\|3|ß|Β|β][[r|r\.|r_|r\-|®][o|o\.|o_|o\-|0|Ο|ο|Φ|¤|°|ø][w|w\.|w_|w\-|ω|ψ|Ψ][n|n\.|n_|n\-|η|Ν|Π](?= )

Demo

You need to use character classes for the different leetspeak variations. This will result in single match for each word. Also instead of using \b I am using a negative lookahead for a space in the end of each word.

Upvotes: 1

Whole word search with multiple matches and non-word characters

Answers (2)

Demo

Related Questions