Reputation: 335
I'm using C# Regex class. I'm trying to split two strings from one. The source (input) string is constructed in following way:
first part must match PO|P|S|[1-5] (in regex syntax).
second part can be VP|GZ|GAR|PP|NAD|TER|NT|OT|LO (again, regex syntax). Second part can occur zero or one time.
Acceptable examples are "PO" (one group), "POGAR" (both groups PO+GAR), "POT" (P+OT)...
So I've use the following regex expression:
Regex r = new Regex("^(?<first>PO|P|S|[1-5])(?<second>VP|GZ|GAR|PP|NAD|TER|NT|OT|LO)?$");
Match match = r.Match(potentialToken);
When potentialToken is "PO", it returns 3 groups! How come? I am expecting just one group (first).
match.Groups are {"PO","PO",""}
Named groups are OK - match.Groups["first"] returns 1 instance, while match.Groups["second"].Success is false.
Upvotes: 0
Views: 1084
Reputation: 20732
When using the numbered groups, the first group is always the complete matched (sub)string (cf. docs - "the first element of the GroupCollection object returned by the Groups property contains a string that matches the entire regular expression pattern"), i.e. in your case PO
.
The second element in Groups
is the capture of your first named group, and the third element is the capture of your second named group - just like the two captures you can retrieve by name. If you check Success
of the numbered groups, you will see that the last element (the one matching your second named group) has a Success
value of false
, as well. You can interpret this as "the group exists, but it did not match anything".
To confirm this, have a look at the output of this testing code:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
Regex r = new Regex("^(?<first>PO|P|S|[1-5])(?<second>VP|GZ|GAR|PP|NAD|TER|NT|OT|LO)?$");
Match match = r.Match("PO");
for (int i = 0; i < match.Groups.Count; i++) {
Console.WriteLine(string.Format("{0}: {1}; {2}", i, match.Groups[i].Success, match.Groups[i].Value));
}
}
}
You can run it here.
Upvotes: 1
Reputation: 73442
RegularExpression will always have one group which is "Group 0" at index 0 even though you don't have any capturing groups.
"Group 0" will be equal to whole match the regex has made(Match.Value
).
Then in your case you get 3 groups because "Group 0" + "Group first" + "Group second"
. As mentioned "Group second" is an optional group so when it doesn't take part in subject .Net regex engine marks "Group second".Success = false
. I don't see anything surprise here. This is the expected behavior.
Upvotes: 1