Reputation: 249
Consider the following code:
wstring String = L"HMM";
wsmatch Matches;
if( regex_match( String, Matches, wregex( L"(H+|M+)([^HM]*)(H+|M+)([^HM]*)(H+|M+)" ) ) )
wcout << Matches[1].str() << L"-" << Matches[3].str() << L"-" << Matches[5].str() << endl;
else
wcout << L"No match\n";
You would expect that the second M+
would greedily consume all M's and give us a No match
as result. Instead we get H-M-M
as result.
It looks that VS13 tries to maximise the number of submatches and sacrifices the greediness for that?
Upvotes: 1
Views: 172
Reputation: 75252
You would expect that the second
M+
would greedily consume allM
's and give us aNo match
as result.
The behavior you're describing is possessive, not greedy. A greedy quantifier consumes as much as it can originally, but then gives back as much as necessary to achieve an overall match.
In flavors that support them, possessive quantifiers are formed by adding a +
after the normal quantifier. This regex would behave the way you expect:
(H++|M++)([^HM]*+)(H++|M++)([^HM]*+)(H++|M++)
However, the flavor you're using doesn't support possessive quantifiers. Atomic groups can be used for the same thing, but you don't have those either.
Upvotes: 1
Reputation: 627390
There are 5 capturing groups in your regex. All of them contain greedy patterns. That means no characters will be given away to the neighboring groups.
(H+|M+)
- Greedily matches Hs or Ms, but only until the next character other than H or M or empty space. Why empty space is included? Because....([^HM]*)
- matches empty string or non-H or non-M greedily.(H+|M+)
- Again, greedily matches 1 or more Hs or Ms until an empty string or non-H or non-M([^HM]*)
- same as above(H+|M+)
- same as above.On regex101.com, empty strings captured are in non-participating groups, you can turn them on in the site options.
You can also check how your regex behaves on http://regexstorm.net/tester. On Tables tab, you will always see all captured substrings.
Upvotes: 1
Reputation: 2553
+
means one or more, and you have three groups that demand one or more H's and M's, which is why you get one of them in each group.
(H+|M+) # Would like to match H
([^HM]*) # Accepted since you allow 0 occurrences
(H+|M+) # Would like to match MM, but can only M
([^HM]*) #
(H+|M+) # This NEEDS to match the last H or M
Greediness means it will match as much as possible, but it will never sacrifice a match for greediness.
Upvotes: 2