Achilles
Achilles

Reputation: 1734

How do I make Regex capture only named groups

According to Regex documentation, using RegexOptions.ExplicitCapture makes the Regex only match named groups like (?<groupName>...); but in action it does something a little bit different.

Consider these lines of code:

static void Main(string[] args) {
    Regex r = new Regex(
        @"(?<code>^(?<l1>[\d]{2})/(?<l2>[\d]{3})/(?<l3>[\d]{2})$|^(?<l1>[\d]{2})/(?<l2>[\d]{3})$|(?<l1>^[\d]{2}$))"
        , RegexOptions.ExplicitCapture
    );
    var x = r.Match("32/123/03");
    r.GetGroupNames().ToList().ForEach(gn => {
        Console.WriteLine("GroupName:{0,5} --> Value: {1}", gn, x.Groups[gn].Success ? x.Groups[gn].Value : "");
    });
}

When you run this snippet you'll see the result contains a group named 0 while I don't have a group named 0 in my regex!

GroupName:    0 --> Value: 32/123/03  
GroupName: code --> Value: 32/123/03  
GroupName:   l1 --> Value: 32  
GroupName:   l2 --> Value: 123  
GroupName:   l3 --> Value: 03  
Press any key to continue . . .  

Could somebody please explain this behavior to me?

Upvotes: 1

Views: 1716

Answers (2)

Nicholas Carey
Nicholas Carey

Reputation: 74365

You always have group 0: that's the entire match. Numbered groups are relative to 1 based on the ordinal position of the opening parenthesis that defines the group. Your regular expression (formatted for clarity):

(?<code>
  ^
  (?<l1> [\d]{2} )
  /
  (?<l2> [\d]{3} )
  /
  (?<l3> [\d]{2} )
  $
|
  ^
  (?<l1>[\d]{2})
  /
  (?<l2>[\d]{3})
  $
|
   (?<l1> ^[\d]{2} $ )
)

Your expression will backtrack, so you might consider simplifying your regular expression. This is probably clearer and more efficient:

static Regex rxCode = new Regex(@"
  ^                    # match start-of-line, followed by
  (?<code>             # a mandatory group ('code'), consisting of
    (?<g1> \d\d )      # - 2 decimal digits ('g1'), followed by
    (                  # - an optional group, consisting of
      /                #   - a literal '/', followed by
      (?<g2> \d\d\d )  #   - 3 decimal digits ('g2'), followed by
      (                #   - an optional group, consisting of
        /              #     - a literal '/', followed by
        (?<g3> \d\d )  #     - 2 decimal digits ('g3')
      )?               #     - END: optional group
    )?                 #   - END: optional group
  )                    # - END: named group ('code'), followed by
  $                    # - end-of-line
" , RegexOptions.IgnorePatternWhitespace|RegexOptions.ExplicitCapture );

Once you have that, something like this:

string[] texts = { "12" , "12/345" , "12/345/67" , } ;

foreach ( string text in texts )
{
  Match m = rxCode.Match( text ) ;
  Console.WriteLine("{0}: match was {1}" , text , m.Success ? "successful" : "NOT successful" ) ;
  if ( m.Success )
  {
    Console.WriteLine( "  code: {0}" , m.Groups["code"].Value ) ;
    Console.WriteLine( "  g1: {0}" , m.Groups["g1"].Value ) ;
    Console.WriteLine( "  g2: {0}" , m.Groups["g2"].Value ) ;
    Console.WriteLine( "  g3: {0}" , m.Groups["g3"].Value ) ;
  }
}

produces the expected

12: match was successful
  code: 12
  g1: 12
  g2:
  g3:
12/345: match was successful
  code: 12/345
  g1: 12
  g2: 345
  g3:
12/345/67: match was successful
  code: 12/345/67
  g1: 12
  g2: 345
  g3: 67

Upvotes: 1

GRUNGER
GRUNGER

Reputation: 496

named group

^(?<l1>[\d]{2})/(?<l2>[\d]{3})/(?<l3>[\d]{2})$|^(?<l1>[\d]{2})/(?<l2>[\d]{3})$|(?<l1>^[\d]{2}$)

enter image description here

try this (i remove first group from your regex) - see demo

Upvotes: 0

Related Questions