Reputation: 7592
The std::regex_match()
doesn't finish (GCC 5.4.0 20160609, x86_64, Ubuntu 16.04).
But it works in some of online compilers: http://cpp.sh/ is ok, for exemple.
The code tries to match an INI-style section header with a possible "#"-comment.
#include <regex>
int main(int argc, char *argv[])
{
std::regex headerPattern("([[:blank:]]*\\[[[:blank:]]*((?:[[:blank:]]*[^[:space:]]+[[:blank:]]*?)+)[[:blank:]]*\\][[:blank:]]*(?:#(?:[^[:space:]]*[[:blank:]]*)*)?)");
std::smatch headerMatch;
std::string l("[Hdr 100] # ------------ 22 22 4444 88888888 333");
return std::regex_match(l, headerMatch, headerPattern) ? 0 : 1;
}
Build:
g++ -std=c++11 main.cpp -o main
Is there really a problem with the code?
Upvotes: 0
Views: 169
Reputation: 627082
The pattern causes catastrophic backtracking. It happens because you have a repeated capturing group ((?:[[:blank:]]*[^[:space:]]+[[:blank:]]*?)+)
(for simplicity, let me write it using PCRE syntax as ((?:\h*\S+\h*?)+)
) and it matches 0+ horizontal whitespace, then 1 or more non-whitespace chars, followed with 0+ horizontal whitespace, and all this is quantified with +
. This is a classical (a+)+
case inside a pattern that makes the catastrophic backtracking inevitable.
You need to unroll this and the other group the following way:
std::regex headerPattern("([[:blank:]]*\\[[[:blank:]]*([^[:space:]]+(?:[[:blank:]]+[^[:space:]]+)*)[[:blank:]]*\\][[:blank:]]*(?:#[^[:space:]]*(?:[[:blank:]]+[^[:space:]]+)*)?)");
See the regex demo. And here is a PCRE-converted variant to understand the difference: the group I mentioned above is now \S+(?:\h+\S+)*
: 1+ non-whitespace chars, followed with 0+ sequences of 1+ horizontal whitespace chars followed with 1+ non-whitespace chars. The last capturing group is changed to \S*(?:\h+\S+)*
: 0+ non-whitespace chars followed with 0+ sequences of 1+ horizontal whitespace chars followed with 1+ non-whitespace chars.
Just replace \h
with [[:blank:]]
(or [^\\S\r\n]
) and \S
with [^[:space:]]
(or keep it, std::regex
supports it) to revert that PCRE pattern to the one you used.
Upvotes: 1