Xu Wang
Xu Wang

Reputation: 10597

C++ regex not understanding

The following outputs ">Hut" where I expect it to output "Hut". I know that .* is greedy but > must be matched and it is outside of the capture group so why is it in my submatch?

#include <string>
#include <regex>
#include <iostream>

using namespace std;

int main() {
        regex my_r(".*>(.*)");
        string temp(R"~(cols="64">Hut)~");
        smatch m;
        if (regex_match(temp, m, my_r)) {
                cout << m[1] << endl;
        }
}

Upvotes: 8

Views: 842

Answers (2)

LihO
LihO

Reputation: 42083

You can modify your regular expression so that matched parts are divided into groups:

std::regex my_r("(.*)>(.*)\\).*"); // group1>group2).*
std::string temp("~(cols=\"64\">Hut)~");
std::sregex_iterator reg_it(temp.begin(), temp.end(), my_r);

if (reg_it->size() > 1) {
    std::cout
        << "1: " << reg_it->str(1) << std::endl  // group1 match
        << "2: " << reg_it->str(2) << std::endl; // group2 match
}

outputs:

1: ~(cols="64"
2: Hut

Note that groups are specified by bracets ( /* your regex here */ ) and if you want to make a bracet part of your expression, then you need to escape it with \, which is \\ in code. For more information see Grouping Constructs.

This question can also help you: How do I loop through results from std::regex_search?

Also don't use using namespace std; at the beginning of your files, it's a bad practice.

Upvotes: 3

kennytm
kennytm

Reputation: 523294

This is a bug in libstdc++'s implementation. Watch these:

#include <string>
#include <regex>
#include <boost/regex.hpp>
#include <iostream>

int main() {
    {
        using namespace std;
        regex my_r("(.*)(6)(.*)");
        smatch m;
        if (regex_match(std::string{"123456789"}, m, my_r)) {
            std::cout << m.length(1) << ", "
                      << m.length(2) << ", "
                      << m.length(3) << std::endl;
        }
    }

    {
        using namespace boost;
        regex my_r("(.*)(6)(.*)");
        smatch m;
        if (regex_match(std::string{"123456789"}, m, my_r)) {
            std::cout << m.length(1) << ", "
                      << m.length(2) << ", "
                      << m.length(3) << std::endl;

        }
    }

    return 0;
}

If you compile with gcc, the first one (libstdc++) returns the totally wrong result 9, -2, 4 and the second one (boost's implementation) returns 5, 1, 3 as expected.

If you compile with clang + libc++, your code works fine.

(Note that libstdc++'s regex implementation is only "partially supported", as described in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52719.)

Upvotes: 7

Related Questions