System.Cats.Lol
System.Cats.Lol

Reputation: 1770

Why does Regex.Match include noncapturing groups in the result?

In matching a regular expression, I want to exclude noncapturing groups from the result. I incorrectly assumed that they'd be excluded by default since, well, they're called noncapturing groups.

For some reason, though, Regex.Match behaves as though I hadn't even specified a noncapturing group. Try running this in the Immediate window:

System.Text.RegularExpressions.Regex.Match("b3a",@"(?:\d)\w").Value

I expected the result to be

"a"

but it's actually

"3a"

This question suggested I look at the Groups, but there is only one Group in the result and it too is "3a". It contains one Capture, also "3a".

What's going on here? Is Regex bugged, or is there an option I need to set?

Upvotes: 3

Views: 350

Answers (3)

O. R. Mapper
O. R. Mapper

Reputation: 20731

You are misunderstanding the purpose of noncapturing groups.

In general, groups (defined by a pair of parentheses ()) mean two things:

  • The contained regular expression is grouped, so any quantifiers after the brackets apply to the whole expression rather than just the previous single character.
  • The substring matching the group is stored as a subcapture in the Groups property.

Sometimes, you do not want the second result for certain groups, which is why noncapturing groups were introduced: They allow you to group a sub-expression without having any matches of it stored in an item in the Groups property.

You have observed that your Groups property contains one item, though - which is true, as by default, the first group is always the capture of the complete expression. cf. in the docs:

If the regular expression engine can find a match, the first element of the GroupCollection object returned by the Groups property contains a string that matches the entire regular expression pattern.


You can still use groups to achieve what you want, by placing the string you want to capture into a group:

\d(\w)

(I have left out the noncapturing group again as it does not change anything in your above expression.)

With this modified expression, the Groups property in your match should have 2 items:

  1. The complete match (of \d\w)
  2. Only the part of the above string you seem to be interested in, matched by \w

Upvotes: 4

BoltClock
BoltClock

Reputation: 723638

Matching is not the same thing as capturing. (?:\d) simply means match a subpattern containing \d, but don't bother putting it in a capture group. Your entire pattern (?:\d)\w looks for a (?:\d) followed by a \w; it's functionally equivalent to \d\w.

If you're trying to match a \w only when it is preceded by a \d, use a lookbehind assertion instead:

System.Text.RegularExpressions.Regex.Match("b3a", @"(?<=\d)\w").Value

Upvotes: 8

falsetru
falsetru

Reputation: 369074

Non-capturing group means it does not make a group. Matching string are included in the resulting string.

If you want exclude that part, use something like lookbehind assertion.

@"(?<=\d)\w"

Upvotes: 4

Related Questions