PCB
PCB

Reputation: 400

std::regex - lookahead assertion not always working

I'm writing a module that's making some string substitutions into text to give to a scripting language. The language's syntax is vaugely lisp-y, so expressions are bounded by parentheses and symbols separated by spaces, most of them starting with '$'. A regular expression like this seems like it should give matches at the appropriate symbol boundaries:

auto re_match_abc = std::regex{ "(?=.*[[:space:]()])\\$abc(?=[()[:space:]].*)" };

But in my environment (Visual C++ 2017, 15.9.19, targetting C++-17) it can match strings without a suitable boundary in front of them:

std::cout << "  $abc   -> " << std::regex_replace(" $abc ", re_match_abc, "***") << std::endl;
std::cout << " ($abc)  -> " << std::regex_replace("($abc)", re_match_abc, "***") << std::endl;
std::cout << "xyz$abc  -> " << std::regex_replace("xyz$abc ", re_match_abc, "***") << std::endl;
std::cout << " $abcdef -> " << std::regex_replace(" $abcdef", re_match_abc, "***") << std::endl;

// Result from VC++ 2017:
//
//       $abc   ->  ***
//      ($abc)  -> (***)
//     xyz$abc  -> xyz***     <= What's going wrong here?
//      $abcdef ->  $abcdef

Why is that regex ignoring the positive-lookahead requirement to have at least one space or parenthesis before the matching text?

[I realize that there are other ways to do this job and to do it really robustly maybe I should use something to turn the string into a token stream, but for the immediate job I have (and because the person authoring the strings that get processed is sitting next to me, so we can coordinate) I thought that regex replacements would do for now.]

Upvotes: 5

Views: 396

Answers (1)

Kevin
Kevin

Reputation: 7324

You need to use a positive lookbehind instead. What you really want is this:

auto re_match_abc = std::regex{ "(?<=[[:space:]()])\\$abc(?=[()[:space:]])" };

You can try it out on a website like https://regex101.com/ (just remove the escaped backslash that's required for the C++ string). It explains what each piece of the regex is doing and shows you everything that matches.

Keep in mind that this will also match things like )$abc)

Edit: std::regex apparently does not support lookbehind. For you specific case you might try something like this:

    auto re_match_abc = std::regex{ "([[:space:]()])\\$abc(?=[()[:space:]])" };
    std::cout << "  $abc   -> " << std::regex_replace(" $abc ", re_match_abc, "$1***") << std::endl;
    std::cout << " ($abc)  -> " << std::regex_replace("($abc)", re_match_abc, "$1***") << std::endl;
    std::cout << "xyz$abc  -> " << std::regex_replace("xyz$abc ", re_match_abc, "$1***") << std::endl;
    std::cout << " $abcdef -> " << std::regex_replace(" $abcdef", re_match_abc, "$1***") << std::endl;

output:

  $abc   ->  *** 
 ($abc)  -> (***)
xyz$abc  -> xyz$abc 
 $abcdef ->  $abcdef

try it here

Here instead of a lookbehind we have a normal capture group. In the replacement we're emitting whatever we captured (a parenthesis or space) followed by the actual string we want to replace $abc with.

Upvotes: 2

Related Questions