Ufuk Can Bicici
Ufuk Can Bicici

Reputation: 3649

Weird Regex behavior in C#

I am trying to extract some alfanumeric expressions out of a longer word in C# using regular expressions. For example I have the word "FooNo12Bee". I use the the following regular expression code, which returns me two matches, "No12" and "No" as results:

alfaNumericWord = "FooNo12Bee";
Match m = Regex.Match(alfaNumericWord, @"(No|Num)\d{1,3}");

If I use the following expression, without paranthesis and without any alternative for "No" it works the way I am expecting, it returns only "No12":

alfaNumericWord = "FooNo12Bee";
Match m = Regex.Match(alfaNumericWord, @"No\d{1,3}");

What is the difference between these two expressions, why using paranthesis results in a redundant result for "No"?

Upvotes: 1

Views: 123

Answers (3)

Jerry
Jerry

Reputation: 71538

Parenthesis in regex are capture groups; meaning what's in between the paren will be captured and stored as a capture group.

If you don't want a capture group but still need a group for the alternation, use a non-capture group instead; by putting ?: after the first paren:

Match m = Regex.Match(alfaNumericWord, @"(?:No|Num)\d{1,3}");

Usually, if you don't want to change the regex for some reason, you can simply retrieve the group 0 from the match to get only the whole match (and thus ignore any capture groups); in your case, using m.Groups[0].Value.

Last, you can improve the efficiency of the regex by a notch using:

Match m = Regex.Match(alfaNumericWord, @"N(?:o|um)\d{1,3}");

Upvotes: 6

Dick van den Brink
Dick van den Brink

Reputation: 14499

It is because the parentheses are creating a group. You can remove the group with ?: like so Regex.Match(alfaNumericWord, @"(?:No|Num)\d{1,3}");

Upvotes: 1

DoXicK
DoXicK

Reputation: 4812

i can't explain how they call it, but it is because putting parentheses around it is creating a new group. it is well explained here

Besides grouping part of a regular expression together, parentheses also create a numbered capturing group. It stores the part of the string matched by the part of the regular expression inside the parentheses.

The regex Set(Value)? matches Set or SetValue. In the first case, the first (and only) capturing group remains empty. In the second case, the first capturing group matches Value.

Upvotes: 1

Related Questions