3ka5_cat
3ka5_cat

Reputation: 131

C++11 VS12 regex_search

I'm trying to retrieve numbers from string. String format like _0_1_ and I want to get 0 and 1.

Here is my code:

std::tr1::regex rx("_(\\d+)_");
tstring fileName = Utils::extractFileName(docList[i]->c_str());                 
std::tr1::smatch res;
std::tr1::regex_search(fileName, res, rx);

but at the result I have (UPDATED: this is strange outputs from debugger watch):

res[0] = 3
res[1] = 1

Where 3 came from and what I'm doing wrong?

UPDATED: I output results to the screen:

for (std::tr1::smatch::iterator it = res.begin(); it < res.end(); ++it){
    std::cout << *it << std::endl;
}

And programm output:

_0_
0

Upvotes: 2

Views: 231

Answers (3)

3ka5_cat
3ka5_cat

Reputation: 131

Solution: Thx to all, rewrite with help of regex_token_iterator and (\\d+). Now it works:

std::regex_token_iterator<tstring::iterator> rend;
tstring fileName = Utils::extractFileName(docList[i]->c_str());                   
std::tr1::regex_search(fileName, res, rx);              
for (std::regex_token_iterator<std::string::iterator> it(fileName.begin(), fileName.end(), rx); it != rend; ++it) {
        std::cout << " [" << *it << "]";
}

Upvotes: 1

user7116
user7116

Reputation: 64068

This appears to be the expected output. The first match should be the entire substring which matched, and then the second (and so forth) should be the capture groups.

If you'd like to go through all matches, you'll need to call regex_search multiple times to get each match:

auto it = fileName.cbegin();
while (std::tr1::regex_search(it, fileName.cend(), res, rx)) {
    std::cout << "Found matching group:" << std::endl;
    for (int mm = 1; mm < res.size(); ++mm) {
        std::cout << std::string(res[mm].first, res[mm].second) << std::endl;
    }

    it = res[0].second; // start 1 past the end
}

If you do really need only the numbers "wrapped" in underscores, you can use a positive assertion (?=_) to ensure this occurs:

// positive assertions are required matches, but are not consumed by the
// matching group.
std::tr1::regex rx("_(\\d+)(?=_)");

Which, when run against "//abc_1_2_3.txt", retrieves 1 and 2, but not 3.

Upvotes: 2

6502
6502

Reputation: 114481

A regexp normally returns all non-overlapping matches, so if you add _ both in front and on the back of numbers you're not going to get all the numbers because the underscore after the first number cannot be used to match also as the underscore before the second number

_123_456_
    ^
    This cannot be used twice

Just use (\\d+) as expression to get all numbers (regexp is "greedy" by default so all the available digits will be found anyway).

Upvotes: 2

Related Questions