Tiago.SR
Tiago.SR

Reputation: 369

Odd regex maches and different from tester

I've following regex to match HTTP headers and body on a input string:

([^()<>@,;:\\\"/\\[\\]?={}\\s\\t]+):(?:[\\s\\t]+)?(.+)\\r\\n(?:\\r\\n([\\s\\S]+))?

Parentheses bellow show expected matches:

(Header-Name): (Its_value)
(Im-a-header): (Im_a_value)

(Anything here,
commonly HTML code...
...)

It works fine in Regex101.com using PCRE, Python or JavaScript flavors, but when I test it in C++ using regex_search, only first header is matched and no more, even body. Using Perl flavor from boost::regex produces even more strange output.

Test code:

#include <regex>
#include <string>
#include <iostream>

int main()
{
        const std::string data("Name: value\r\nFoo: bar\r\n\r\nanything\r\nhere");
        std::regex pattern("([^()<>@,;:\\\"/\\[\\]?={}\\s\\t]+):(?:[\\s\\t]+)?(.+)\\r\\n(?:\\r\\n([\\s\\S]+))?");
        std::smatch result;

        std::regex_search(data, result, pattern);

        for(const auto &match : result)
                std::cout << match << std::endl;
}

Output:

Name: value

Name
value

Output changing from std to boost (and automatically to Perl flavor):

Name: value
Foo: bar

anything here

Name
value
Foo: bar

anything here

Obs.: I used boost only to test resulting output. I don't want any Perl specific solution.

I would like to get an output similar to the following with such code:

Name
value
Foo
bar
anything
here

Can somebody understand what is the problem and help me with this, please?

Upvotes: 1

Views: 61

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627128

It seems there are several issues.

  1. You need to run regex_search several times to obtain several matches each havinf capturing groups.
  2. Since you will need to modify the input string, you will need to declare it as not a constant.
  3. The regex itself places anything\r\nhere into group 3, and you should check if it is filled out before trying to print/obtain it.

Here is a fixed version:

string data("Name: value\r\nFoo: bar\r\n\r\nanything\r\nhere");
std::regex pattern("([^()<>@,;:\\\\\"/\\[\\]?={}\\s]+):\\s*(.+)\r\n(?:\r\n([\\s\\S]+))?");
std::smatch result;

while (regex_search(data, result, pattern)) {
    std::cout << result[1] << "\n" << result[2] << std::endl;
    if (result[3].str().size() > 0)
    {
        std::cout << result[3] << std::endl;
    }
    data = result.suffix().str();
}

See IDEONE demo. Output:

Name
value
Foo
bar
anything
here

Upvotes: 1

Related Questions