Hemant Bhargava
Hemant Bhargava

Reputation: 3585

avoid regex greediness

Basic regex question.

By default, regular expression are greedy, it seems. For e.g. below code:

#include <regex>
#include <iostream>

int main() {
  const std::string t = "*1 abc";
  std::smatch match;
  std::regex rgxx("\\*(\\d+?)\\s+(.+?)$");
  bool matched1 = std::regex_search(t.begin(), t.end(), match, rgxx);
  std::cout << "Matched size " << match.size() << std::endl;

  for(int i = 0 ; i < match.size(); ++i) {
    std::cout << i << " match " << match[i] << std::endl;
  }
}

This will produce an output of:

Matched size 3
**0 match *1 abc**
1 match 1
2 match abc

As an general regular expression writer, I would expected only

1 match 1
2 match abc

to come. First match is coming because of regex greediness, I think. How is it avoidable?

Upvotes: 0

Views: 138

Answers (2)

Caleth
Caleth

Reputation: 63117

You only have one match. That match has 2 "marked subexpressions", because that's what the regex specifies. You don't have multiple matches of that regex.

From std::regex_search

m.size(): number of marked subexpressions plus 1, that is, 1+rgxx.mark_count()

If you are looking for multiple matches, use std::regex_iterator

Upvotes: 0

kmdreko
kmdreko

Reputation: 60493

From std::regex_search: match[0] is not the result of greedy evaluation, but is the range of the entire match. The match elements [1, n) are the capture groups.

Here's in illustration of what the match results mean:

regex     "hello ([\\w]+)"

string   = "Oh, hello John!"
match[0] =     "hello John"   // matches the whole regex above
match[1] =           "John"   // the first capture group

Upvotes: 1

Related Questions