Reputation: 35270
Why does this regular expression:
Summary:(\r\n\t\t\/\/ (.+))+
not match the last line of the following string? (NOTE the whitespace at the beginning of each line is two tab characters but has been converted to all spaces, at least in my browser--it's correct in edit mode, though.) Shouldn't the +
quantifier cause the part of the pattern in the outer-most parens to match the last line too?
// // Summary: // Do absolutely nothing and don't do anything else other than to do nothing at // all.
Here's the result on http://regexstorm.net/tester:
Upvotes: 2
Views: 61
Reputation: 626747
This looks like a bug to me. Look what is happening:
Summary:
is matched first(\r\n\t\t// (.+))+
- at Iteration 1, it grabs "\r\n\t\t// Do absolutely nothing and don't do anything else other than to do nothing at\r"
(pay attention at the last \r
, .
in a .NET regex, by default, matches a CR symbol)+
quantifier signals the regex engine to try and match the substring to the right of the current match, i.e. "\n\t\t// all."
. It cannot match it as it starts with \n
. The pattern should expand like "\r\n\t\t// (.+)\r\n\t\t// (.+)"
and so on, i.e. \r\n\t\t// (.+)(?:\r\n\t\t// (.+))*
, but it does not turn on backtracking with (.+)
. Indeed, the regex engine has a way to re-match the string differently as .+
qualifies for backtracking, but somehow the .
that matches a CR does not want to give it back.The workaround is to either match the first \r
as an optional symbol:
Summary:(\r?\n\t\t// (.+))+
Or, just match any chars but CR and LF with [^\r\n]+
(this will ensure cleaner values in the Group 2 capture stack):
Summary:(\r\n\t\t// ([^\r\n]+))+
See the regex demo.
Upvotes: 2