Reputation: 137
I need a regex to match tokens for a syntax highlighter, which should match full words when surrounded by non-alphanumeric characters or string boundaries. The regex I initially came up with is:
(?<=[^\w]|^)TOKEN(?=[^\w]|$)
Where TOKEN
is the token I'm searching for. This works in regex testers, but c++'s regex doesn't support lookbehinds. Omitting the lookbehind causes the regex to match the character before the token as well, which causes issues. I'm aware boost::regex supports lookbehinds, but I'd like to keep to std::regex if possible.
My question is: can I change my regex to exclude the character before the token from the match?
Upvotes: 0
Views: 199
Reputation: 163207
The pattern is missing a closing ]
at the end, and \w
also matches \d
You might use an alternation asserting either the start of the string, or a position where \b
does not match and assert not a word char to the right.
(?:^|\B)TOKEN(?!\w)
After the update of the question, you can write (?<=[^\w]|^)TOKEN(?=[^\w]|$)
as (?<=\W|^)TOKEN(?=\W|$)
or in short without the lookbehind:
\bTOKEN(?!\w)
Upvotes: 1