Reputation: 12073
I have a particular regular expression:
#\b[a-z0-9-_%"]+\b#gi
I have the following test string I am applying that regex filter to:
abc def ghi jkl mno %%car% __car_ tall-person "thing" 20% %30%
However, the detected word boundaries are as follows (square brackets represent boundaries):
[abc] [def] [ghi] [jkl] [mno] %%[car%] [__car_] [tall-person] "[thing"] [20%] %[30%]
So, certain types of punctuation ("_") are recognized at both the beginning and end of the word as "word characters." On the other hand, other types ("%" or double quotes) are ignored when they are at the beginning of the word. Why is this?
Upvotes: 4
Views: 2718
Reputation: 369024
In word boundary, word means \w
meta character (in most regular expression engine): [A-Za-z0-9_]
; %
, "
is not in that characters: match word boundary.
I think you don't need to use word boundary:
// javascript example
> 'abc def ghi jkl mno %%car% __car_ tall-person "thing" 20% %30%'.match(/[a-z0-9-_%"]+/g)
["abc", "def", "ghi", "jkl", "mno", "%%car%", "__car_", "tall-person", ""thing"", "20%", "%30%"]
Upvotes: 3