Reputation: 695
.NET implementation of Regex defines the '?' character as a greedy quantifier that informs its expression to match 0 or 1 times and to prefer 1 if possible.
Consider the following source text:
some text (some parenthetical text)
And the following regex:
\A(.+)(?:\s\(.+\))?$
The result should be one matching group with the value:
some text
Instead, it is the whole line. Now when I remove from the regex the greedy 0 or 1 quantifier '?', I do get the expected result. However, since my requirements expect the parenthetical text may not exist, I can't leave that 0 or 1 quantifier off. How do I force it to be greedy?
Upvotes: 3
Views: 903
Reputation: 2834
The reason why this doesn't match the way you think it will is because (.+)
is greedy.
Let me explain:
(.+)
is greedy so it will immediately match the entire string.
(?:\s\(.+\))?
is also greedy however just because something is greedy it doesn't mean that it has to match if it doesn't have too.
Take this example:
string: abc123
regex: (.+)(\d{3})?
.+
will start out matching abc123
. The regex engine will get to the next character (which is an empty character) and see this (\d{3})?
. Now, the regex engine will prefer to match \d{3}
if possible but it has already matched the entire string. Since \d{3}
is technically optional, it can throw it away.
Your best bet is to make the first section lazy and keep the last section greedy.
\A(.+)(?:\s\(.+\))?$
will become \A(.+?)(?:\s\(.+\))?$
(.+?)
will try to match as few characters as possible so it leaves room for the second half but if that second half is not needed it'll consume the rest of the string.
Here's regex101 with an example (I changed \A
to ^
so multiline would work)
Upvotes: 3