morleyc
morleyc

Reputation: 2431

.NET regex match returning too many elements

Per this question/answer, I use the following regex to parse name (100) the name from the number in brackets, to give:

  1. Name to the left of the opening bracket, with whitespace left/right stripped
  2. The number in the brackets

With my C# code:

var found = Regex.Match("morleyc (1005)", @"(\S*)\s*\((\d*)", RegexOptions.IgnoreCase)

I get an array of 3 items, whereas I would expect a 2 element array containing the 2nd and 3rd items only:

morleyc (1005
morleyc
1005

This is what I expect (as as per regexstorm.net elements):

morleyc
1005

Please advise what I am doing wrong in my code?

.net fiddle @ https://dotnetfiddle.net/5DVWPs

Upvotes: 0

Views: 148

Answers (3)

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186668

Probably, you want

 @"(?<name>\w+)\s*\((?<number>[0-9]+)\)"

pattern, where

 \w+        - one or more word (letter or digit) characters for name
 \s*        - optional (zero or more) whitespaces
 \([0-9]+\) - one or more digits in parenthesis for number

Note named capturing groups:

 (?<name> ... )    - part of the match which stands for name
 (?<number>  ... ) - -/- stands for number

If name can contain letters only (no digits are allowed) you can put

 @"(?<name>\p{L}+)\s*\((?<number>[0-9]+)\)"

pattern, where \p{L} stands for a unicode letter

Demo:

var found = Regex.Match(
  "morleyc (1005)", 
 @"(?<name>\w+)\s*\((?<number>[0-9]+)\)", 
   RegexOptions.IgnoreCase);
        
Console.WriteLine($"Name: {found.Groups["name"].Value}");
Console.WriteLine($"Number: {found.Groups["number"].Value}");

Outcome:

Name: morleyc
Number: 1005

Fiddle

Upvotes: 1

Andrei
Andrei

Reputation: 1036

You did it correctly. According to .NET documentation:

the first element of the GroupCollection object (the element at index 0) returned by the Groups property contains a string that matches the entire regular expression pattern

So, regex pattern with 2 groups will return 3 results:

  1. string that matches the pattern
  2. 1st group
  3. 2nd group

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163277

The morleyc (1005 part in the result is the full match. The pattern is also not matching the closing )

You could check if there is a match, and if there is, get the group 1 and group 2 values only.

Note that in the pattern, almost everything is optional except ( so it can also match a single (

var found = Regex.Match("morleyc (1005)", @"(\S*)\s*\((\d*)\)", RegexOptions.IgnoreCase);
if (found.Success) {
    Console.WriteLine(found.Groups[1].Value);
    Console.WriteLine(found.Groups[2].Value);
}

See the fiddle.

Output

morleyc
1005

enter image description here

A bit more specific pattern could be:

(\S+)[\p{Zs}\t]+\(([0-9]+)\)
  • (\S+) Capture group 1, match 1+ non whitespace chars
  • [\p{Zs}\t]+ Match 1 or more spaces (\s can also match a newline)
  • \(([0-9]+)\) Capture group 2, match 1+ digits 0-9 between matchint the ( and )

.NET regex demo

Upvotes: 1

Related Questions