Reputation: 97
My string looks like this
macd_at([{1036}].CLOSE,10,10,10).UPPER
In this string I am trying to match this regex
([a-zA-Z][a-zA-Z0-9_]*_(at|AT)\((((\[\{[0-9]+\}\](\.(OPEN|CLOSE|LOW|HIGH))?)|[1-9][0-9]*\.?[0-9]*|(TRUE|FALSE)|\"[^"]*\"),)*((\[\{[0-9]+\}\](\.(OPEN|CLOSE|LOW|HIGH))?)|[1-9][0-9]*\.?[0-9]*|(TRUE|FALSE)|\"[^"]*\")\)(\.(VALUE|UPPER|LOWER|PRICE))?)
In online sites which check regex this is matched, but when I call std::regex_search it does not work. Is there some bug in VS C++ library?
When I change string
macd_at([{1036}],10,10,10).UPPER
std::regex_search is working. Is there some limit how complicated regex can be.
PS: Regex building process was following (for easier looking to regex):
const std::string NUMBER_REGEX_PATERN = "[1-9][0-9]*\\.?[0-9]*";
const std::string OPERATOR_REGEX_PATERN = "(\\*|/|-|\\+)";
const std::string SYMBOL_REGEX_PATERN = "\\[\\{[0-9]+\\}\\]";
const std::string SYMBOL_SUFFIX_REGEX_PATERN = "(\\.(OPEN|CLOSE|LOW|HIGH))";
const std::string SYMBOL_WHOLE_REGEX_PATERN = "(" + SYMBOL_REGEX_PATERN + SYMBOL_SUFFIX_REGEX_PATERN + "?)";
const std::string STRING_REGEX_PATERN = "\\\"[^\"]*\\\"";
const std::string BOOLIAN_REGEX_PATERN = "(TRUE|FALSE)";
const std::string LITERAL_REGEX_PATERN = "(" + SYMBOL_WHOLE_REGEX_PATERN + "|" + NUMBER_REGEX_PATERN + "|" + BOOLIAN_REGEX_PATERN +"|" + STRING_REGEX_PATERN + ")";
const std::string STUDY_NAME_REGEX_PATERN = "[a-zA-Z][a-zA-Z0-9_]*_(at|AT)";
const std::string STUDY_SUFFIX_REGEX_PATERN = "(\\.(VALUE|UPPER|LOWER|PRICE))";
const std::string WHOLE_STUDY_REGEX_PATERN = STUDY_NAME_REGEX_PATERN + "\\((" +LITERAL_REGEX_PATERN + ",)*"+ LITERAL_REGEX_PATERN + "\\)";
const std::string WHOLE_STUDY_WITH_SUFIX_REGEX_PATERN = "(" + WHOLE_STUDY_REGEX_PATERN + STUDY_SUFFIX_REGEX_PATERN + "?)";
Upvotes: 0
Views: 278
Reputation: 44259
Seeing the complexity of the pattern, excessive backtracking might be a problem. One point where you can reduce backtracking significantly is your second-to-last building block. Try changing
...(" +LITERAL_REGEX_PATERN + ",)*"+ LITERAL_REGEX_PATERN...
into
...LITERAL_REGEX_PATERN + "(" +LITERAL_REGEX_PATERN + ",)*"...
This is a simplified form of the unrolling-the-loop technique and reduces the amount of backtracking a lot. Note that both patterns match exactly the same string.
Another point to optimize:
If you don't need all the capturing groups (and I doubt you need them, because some of them get overwritten in the repetition), turn them into non-capturing groups. E.g.
(?:\\.(?:OPEN|CLOSE|LOW|HIGH))
Especially in conjunction with backtracking, unnecessary capturing can get quite expensive.
Upvotes: 1