Reputation: 8159
I'm trying to match all instances of a whole word, along with leetspeak alternatives, in a string. For example, let's take this string:
brΦwn 8rown The quiçκ brΦwη fox βrown b®øwΠ brownie
I'm trying to capture the 5 "brown" instances, but not the "brownie". I have the following regex to match this:
/\b(b|b\.|b_|b\-|8|\|3|ß|Β|β)(r|r\.|r_|r\-|®)(o|o\.|o_|o\-|0|Ο|ο|Φ|¤|°|ø)(w|w\.|w_|w\-|ω|ψ|Ψ)(n|n\.|n_|n\-|η|Ν|Π)\b/i
The issue seems to be that \b
matches the first non-word character and because characters like Π
are non-word characters, that gets matched and not the space afterward. Targeting white space using \s
doesn't work for consecutive words (it would only match the first "brown" in "the quick brown brown fox").
Any suggestions for how to make this work?
A Regex101 setup: https://regex101.com/r/LKo9Xf/4
Upvotes: 0
Views: 104
Reputation: 27743
Maybe, we could use two lookarounds instead of word boundaries, such as:
(?<!\S) and (?!\S) are functionally better whitespace boundary's, and much quicker.
by sln, or:
(?<=^|\s)(b|b\.|b_|b\-|8|\|3|ß|Β|β)(r|r\.|r_|r\-|®)(o|o\.|o_|o\-|0|Ο|ο|Φ|¤|°|ø)(w|w\.|w_|w\-|ω|ψ|Ψ)(n|n\.|n_|n\-|η|Ν|Π)(?=\s|$)
Upvotes: 0
Reputation: 1306
Does this do the trick?
[b|b\.|b_|b\-|8|\|3|ß|Β|β][[r|r\.|r_|r\-|®][o|o\.|o_|o\-|0|Ο|ο|Φ|¤|°|ø][w|w\.|w_|w\-|ω|ψ|Ψ][n|n\.|n_|n\-|η|Ν|Π](?= )
You need to use character classes for the different leetspeak variations. This will result in single match for each word. Also instead of using \b
I am using a negative lookahead for a space in the end of each word.
Upvotes: 1