Wakan Tanka
Wakan Tanka

Reputation: 8042

Regexp capturing crashes the code

I'm trying to figure out how regex in c++ works so I've did this example where I try different regexp and see if they match or not:

#include <regex>

int main(){

    while (true) {
        string needle;
        cin >> needle;
        regex regexp(needle);
        std::smatch smatch;
        string haystack = "caps.caps[0].MainFormat[0].Video.BitRateOptions = 896, 1536";

        bool match = regex_search(haystack, smatch, regexp);

        if (match) {
            cout << "Matched" << endl;
        }
        else {
            cout << "Mismatch" << endl;
        }
    }
}

Here are the results:

caps.caps[0].MainFormat[0].Video.BitRateOptions
Mismatch
(caps.caps[0].MainFormat[0].Video.BitRateOptions)
Mismatch
caps\.caps\[0\]\.MainFormat\[0\]\.Video\.BitRateOptions
Matched
(caps\.caps\[0\]\.MainFormat\[0\]\.Video\.BitRateOptions)
Matched
caps\.caps\[0\]\.MainFormat\[0\]\.Video\.BitRateOptions=
Mismatch
(caps\.caps\[0\]\.MainFormat\[0\]\.Video\.BitRateOptions=)
Mismatch
caps\.caps\[0\]\.MainFormat\[0\]\.Video\.BitRateOptions =
Matched
Matched
(caps\.caps\[0\]\.MainFormat\[0\]\.Video\.BitRateOptions =)
THIS ONE BREAK THE PROCESS AND ENDS
caps.caps\[0]
THIS ONE BREAK THE PROCESS AND ENDS

Why caps\.caps\[0\]\.MainFormat\[0\]\.Video\.BitRateOptions = returns two matches and why capturing this regex crashes the code? Based on this I assume that when I want to match '[' or ']' I need to escape it, and maybe there are some other cases where wrongly constructed regexp might crash the process. Is there any option that will handle unescaped '[' or ']' and other wrong regexp so the code will not crash but rather mismatch? I'm using Visual Studio 2017 on Windows 10. Thanks

Upvotes: 1

Views: 299

Answers (1)

Olaf Dietsche
Olaf Dietsche

Reputation: 74028

The first one

caps\.caps\[0\]\.MainFormat\[0\]\.Video\.BitRateOptions =

returns two matches, because std::cin >> needle; reads only until the first whitespace character is found (first match). Then it reads the next "word" =, which gives the second match.


Similar behaviour happens with the next one

(caps\.caps\[0\]\.MainFormat\[0\]\.Video\.BitRateOptions =)

The first part is read (... excluding the first whitespace. Now the regular expression is incomplete and an exception is thrown. With g++ this looks like

terminate called after throwing an instance of 'std::regex_error'
what(): regex_error


If you want the complete line, use std::getline instead

while (std::getline(std::cin, needle)) {
// ...
}

I cannot reproduce any abort with the final one

caps.caps\[0]

This returns a match as expected.

Upvotes: 2

Related Questions