Reputation: 2431
Per this question/answer, I use the following regex to parse name (100)
the name from the number in brackets, to give:
With my C# code:
var found = Regex.Match("morleyc (1005)", @"(\S*)\s*\((\d*)", RegexOptions.IgnoreCase)
I get an array of 3 items, whereas I would expect a 2 element array containing the 2nd and 3rd items only:
morleyc (1005
morleyc
1005
This is what I expect (as as per regexstorm.net elements):
morleyc
1005
Please advise what I am doing wrong in my code?
.net fiddle @ https://dotnetfiddle.net/5DVWPs
Upvotes: 0
Views: 148
Reputation: 186668
Probably, you want
@"(?<name>\w+)\s*\((?<number>[0-9]+)\)"
pattern, where
\w+ - one or more word (letter or digit) characters for name
\s* - optional (zero or more) whitespaces
\([0-9]+\) - one or more digits in parenthesis for number
Note named capturing groups:
(?<name> ... ) - part of the match which stands for name
(?<number> ... ) - -/- stands for number
If name can contain letters only (no digits are allowed) you can put
@"(?<name>\p{L}+)\s*\((?<number>[0-9]+)\)"
pattern, where \p{L}
stands for a unicode letter
Demo:
var found = Regex.Match(
"morleyc (1005)",
@"(?<name>\w+)\s*\((?<number>[0-9]+)\)",
RegexOptions.IgnoreCase);
Console.WriteLine($"Name: {found.Groups["name"].Value}");
Console.WriteLine($"Number: {found.Groups["number"].Value}");
Outcome:
Name: morleyc
Number: 1005
Upvotes: 1
Reputation: 1036
You did it correctly. According to .NET documentation:
the first element of the GroupCollection object (the element at index 0) returned by the Groups property contains a string that matches the entire regular expression pattern
So, regex pattern with 2 groups will return 3 results:
Upvotes: 1
Reputation: 163277
The morleyc (1005
part in the result is the full match. The pattern is also not matching the closing )
You could check if there is a match, and if there is, get the group 1 and group 2 values only.
Note that in the pattern, almost everything is optional except (
so it can also match a single (
var found = Regex.Match("morleyc (1005)", @"(\S*)\s*\((\d*)\)", RegexOptions.IgnoreCase);
if (found.Success) {
Console.WriteLine(found.Groups[1].Value);
Console.WriteLine(found.Groups[2].Value);
}
See the fiddle.
Output
morleyc
1005
A bit more specific pattern could be:
(\S+)[\p{Zs}\t]+\(([0-9]+)\)
(\S+)
Capture group 1, match 1+ non whitespace chars[\p{Zs}\t]+
Match 1 or more spaces (\s
can also match a newline)\(([0-9]+)\)
Capture group 2, match 1+ digits 0-9 between matchint the (
and )
Upvotes: 1