Velkan
Velkan

Reputation: 7592

std::regex infinite loop with gcc 5.4

The std::regex_match() doesn't finish (GCC 5.4.0 20160609, x86_64, Ubuntu 16.04).

But it works in some of online compilers: http://cpp.sh/ is ok, for exemple.

The code tries to match an INI-style section header with a possible "#"-comment.

#include <regex>

int main(int argc, char *argv[])
{
    std::regex headerPattern("([[:blank:]]*\\[[[:blank:]]*((?:[[:blank:]]*[^[:space:]]+[[:blank:]]*?)+)[[:blank:]]*\\][[:blank:]]*(?:#(?:[^[:space:]]*[[:blank:]]*)*)?)");
    std::smatch headerMatch;
    std::string l("[Hdr 100] # ------------ 22 22 4444 88888888 333");
    return std::regex_match(l, headerMatch, headerPattern) ? 0 : 1;
}

Build:

g++ -std=c++11 main.cpp -o main

Is there really a problem with the code?

Upvotes: 0

Views: 169

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

The pattern causes catastrophic backtracking. It happens because you have a repeated capturing group ((?:[[:blank:]]*[^[:space:]]+[[:blank:]]*?)+) (for simplicity, let me write it using PCRE syntax as ((?:\h*\S+\h*?)+)) and it matches 0+ horizontal whitespace, then 1 or more non-whitespace chars, followed with 0+ horizontal whitespace, and all this is quantified with +. This is a classical (a+)+ case inside a pattern that makes the catastrophic backtracking inevitable.

You need to unroll this and the other group the following way:

std::regex headerPattern("([[:blank:]]*\\[[[:blank:]]*([^[:space:]]+(?:[[:blank:]]+[^[:space:]]+)*)[[:blank:]]*\\][[:blank:]]*(?:#[^[:space:]]*(?:[[:blank:]]+[^[:space:]]+)*)?)");

See the regex demo. And here is a PCRE-converted variant to understand the difference: the group I mentioned above is now \S+(?:\h+\S+)*: 1+ non-whitespace chars, followed with 0+ sequences of 1+ horizontal whitespace chars followed with 1+ non-whitespace chars. The last capturing group is changed to \S*(?:\h+\S+)*: 0+ non-whitespace chars followed with 0+ sequences of 1+ horizontal whitespace chars followed with 1+ non-whitespace chars.

Just replace \h with [[:blank:]] (or [^\\S\r\n]) and \S with [^[:space:]] (or keep it, std::regex supports it) to revert that PCRE pattern to the one you used.

Upvotes: 1

Related Questions