Reputation: 400
I'm writing a module that's making some string substitutions into text to give to a scripting language. The language's syntax is vaugely lisp-y, so expressions are bounded by parentheses and symbols separated by spaces, most of them starting with '$'. A regular expression like this seems like it should give matches at the appropriate symbol boundaries:
auto re_match_abc = std::regex{ "(?=.*[[:space:]()])\\$abc(?=[()[:space:]].*)" };
But in my environment (Visual C++ 2017, 15.9.19, targetting C++-17) it can match strings without a suitable boundary in front of them:
std::cout << " $abc -> " << std::regex_replace(" $abc ", re_match_abc, "***") << std::endl;
std::cout << " ($abc) -> " << std::regex_replace("($abc)", re_match_abc, "***") << std::endl;
std::cout << "xyz$abc -> " << std::regex_replace("xyz$abc ", re_match_abc, "***") << std::endl;
std::cout << " $abcdef -> " << std::regex_replace(" $abcdef", re_match_abc, "***") << std::endl;
// Result from VC++ 2017:
//
// $abc -> ***
// ($abc) -> (***)
// xyz$abc -> xyz*** <= What's going wrong here?
// $abcdef -> $abcdef
Why is that regex ignoring the positive-lookahead requirement to have at least one space or parenthesis before the matching text?
[I realize that there are other ways to do this job and to do it really robustly maybe I should use something to turn the string into a token stream, but for the immediate job I have (and because the person authoring the strings that get processed is sitting next to me, so we can coordinate) I thought that regex replacements would do for now.]
Upvotes: 5
Views: 396
Reputation: 7324
You need to use a positive lookbehind instead. What you really want is this:
auto re_match_abc = std::regex{ "(?<=[[:space:]()])\\$abc(?=[()[:space:]])" };
You can try it out on a website like https://regex101.com/ (just remove the escaped backslash that's required for the C++ string). It explains what each piece of the regex is doing and shows you everything that matches.
Keep in mind that this will also match things like )$abc)
Edit: std::regex
apparently does not support lookbehind. For you specific case you might try something like this:
auto re_match_abc = std::regex{ "([[:space:]()])\\$abc(?=[()[:space:]])" };
std::cout << " $abc -> " << std::regex_replace(" $abc ", re_match_abc, "$1***") << std::endl;
std::cout << " ($abc) -> " << std::regex_replace("($abc)", re_match_abc, "$1***") << std::endl;
std::cout << "xyz$abc -> " << std::regex_replace("xyz$abc ", re_match_abc, "$1***") << std::endl;
std::cout << " $abcdef -> " << std::regex_replace(" $abcdef", re_match_abc, "$1***") << std::endl;
output:
$abc -> ***
($abc) -> (***)
xyz$abc -> xyz$abc
$abcdef -> $abcdef
Here instead of a lookbehind we have a normal capture group. In the replacement we're emitting whatever we captured (a parenthesis or space) followed by the actual string we want to replace $abc
with.
Upvotes: 2