Vasoli
Vasoli

Reputation: 97

Visual C++ std::regex_search bug?

My string looks like this

macd_at([{1036}].CLOSE,10,10,10).UPPER 

In this string I am trying to match this regex

([a-zA-Z][a-zA-Z0-9_]*_(at|AT)\((((\[\{[0-9]+\}\](\.(OPEN|CLOSE|LOW|HIGH))?)|[1-9][0-9]*\.?[0-9]*|(TRUE|FALSE)|\"[^"]*\"),)*((\[\{[0-9]+\}\](\.(OPEN|CLOSE|LOW|HIGH))?)|[1-9][0-9]*\.?[0-9]*|(TRUE|FALSE)|\"[^"]*\")\)(\.(VALUE|UPPER|LOWER|PRICE))?)

In online sites which check regex this is matched, but when I call std::regex_search it does not work. Is there some bug in VS C++ library?

When I change string

macd_at([{1036}],10,10,10).UPPER 

std::regex_search is working. Is there some limit how complicated regex can be.

PS: Regex building process was following (for easier looking to regex):

const std::string NUMBER_REGEX_PATERN = "[1-9][0-9]*\\.?[0-9]*";
const std::string OPERATOR_REGEX_PATERN = "(\\*|/|-|\\+)";
const std::string SYMBOL_REGEX_PATERN = "\\[\\{[0-9]+\\}\\]";
const std::string SYMBOL_SUFFIX_REGEX_PATERN = "(\\.(OPEN|CLOSE|LOW|HIGH))";
const std::string SYMBOL_WHOLE_REGEX_PATERN = "(" + SYMBOL_REGEX_PATERN + SYMBOL_SUFFIX_REGEX_PATERN + "?)";
const std::string STRING_REGEX_PATERN = "\\\"[^\"]*\\\"";
const std::string BOOLIAN_REGEX_PATERN = "(TRUE|FALSE)";
const std::string LITERAL_REGEX_PATERN = "(" + SYMBOL_WHOLE_REGEX_PATERN + "|" + NUMBER_REGEX_PATERN + "|" + BOOLIAN_REGEX_PATERN +"|" + STRING_REGEX_PATERN + ")";

const std::string STUDY_NAME_REGEX_PATERN = "[a-zA-Z][a-zA-Z0-9_]*_(at|AT)";
const std::string STUDY_SUFFIX_REGEX_PATERN = "(\\.(VALUE|UPPER|LOWER|PRICE))";
const std::string WHOLE_STUDY_REGEX_PATERN = STUDY_NAME_REGEX_PATERN + "\\((" +LITERAL_REGEX_PATERN + ",)*"+ LITERAL_REGEX_PATERN + "\\)";
const std::string WHOLE_STUDY_WITH_SUFIX_REGEX_PATERN = "(" + WHOLE_STUDY_REGEX_PATERN + STUDY_SUFFIX_REGEX_PATERN + "?)";

Upvotes: 0

Views: 278

Answers (1)

Martin Ender
Martin Ender

Reputation: 44259

Seeing the complexity of the pattern, excessive backtracking might be a problem. One point where you can reduce backtracking significantly is your second-to-last building block. Try changing

...(" +LITERAL_REGEX_PATERN + ",)*"+ LITERAL_REGEX_PATERN...

into

...LITERAL_REGEX_PATERN + "(" +LITERAL_REGEX_PATERN + ",)*"...

This is a simplified form of the unrolling-the-loop technique and reduces the amount of backtracking a lot. Note that both patterns match exactly the same string.

Another point to optimize:

If you don't need all the capturing groups (and I doubt you need them, because some of them get overwritten in the repetition), turn them into non-capturing groups. E.g.

(?:\\.(?:OPEN|CLOSE|LOW|HIGH))

Especially in conjunction with backtracking, unnecessary capturing can get quite expensive.

Upvotes: 1

Related Questions