Reputation: 5542
I have a very simple regex like this in C#:
(var \= 0\;)
But when I try to match this against a string that has only one occurrence of the pattern, I get multiple groups returned. The input string is:
foo bar
var = 0;
foo
I get 1 match returned by the Regex object, but inside I see two groups, each has 1 capture, which is the string I want. I need the grouping parentheses in the regex because this is part of a bigger regex, and I need this to be captured as a group. What am I doing wrong?
EDIT
This is the C# code I'm using:
private const string REGEX = "(var \\= [0]\\;)";
MatchCollection matches = REGEX.Matches(inputStr);
foreach (Match m in matches)
{
foreach (Group g in m.Groups)
{
Console.WriteLine("group[" + g.Captures.Count + "]: '" + g.ToString() + "'");
}
}
This is what I get:
group[1]: 'var = 0;'
group[1]: 'var = 0;'
My question is, why do I get two groups and not one?
EDIT #2:
A more complicated pattern shows the problem. The pattern:
# preceding comment
class
{
(param1 = "val1", param2 = "val2", param3 = val3)
}
[
# inside comment
setting1 = 0;
setting2 = 0;
]
The regex I'm using: (it's probably not the most obvious, but you can paste it in a regex viewer if you want to check it out)
(\#[^\n]*)?(?:[\s\r\n]*)domain(?:[\s\r\n]*)\{(?:[\s\r\n]*)\((?:[\s\r\n]*)(((?:[\s\r\n]*)(accountName(?:[\s\r\n]*)\=(?:[\s\r\n]*)\"[^"]+\"[,]?)(?:[\s\r\n]*))|((?:[\s\r\n]*)(tableName(?:[\s\r\n]*)\=(?:[\s\r\n]*)\"[^"]+\"[,]?)(?:[\s\r\n]*))|((?:[\s\r\n]*)(cap(?:[\s\r\n]*)\=(?:[\s\r\n]*)[\d]+[,]?)(?:[\s\r\n]*))|((?:[\s\r\n]*)(MinPartitionCount(?:[\s\r\n]*)\=(?:[\s\r\n]*)[\d]+[,]?)(?:[\s\r\n]*)))+\)(?:[\s\r\n]*)\}(?:[\s\r\n]*)\[(?:[\s\r\n]*)(\#[^\n]*)?(?:[\s\r\n]*)((?:[\s\r\n]*)(IsSplitEnabled(?:[\s\r\n]*)\=(?:[\s\r\n]*)[0|1](?:[\s\r\n]*)\;)(?:[\s\r\n]*)|(?:[\s\r\n]*)(IsMergeEnabled(?:[\s\r\n]*)\=(?:[\s\r\n]*)[0|1](?:[\s\r\n]*)\;)(?:[\s\r\n]*))*(?:[\s\r\n]*)\]
And I'm getting:
group:1: '# preceding comment
domain
{
(param1 = "val1", param2 = "val2", param3 = val3)
}
[
# inside comment
setting1 = 0;
setting2 = 0;
]'
'roup:1: '# preceding comment
group:3: 'cap = 1200'
group:1: 'param1 = "val1", '
group:1: 'param1 = "val1",'
group:1: 'param2 = "val2", '
group:1: 'param2 = "val2",'
group:1: 'param3 = val3'
group:1: 'param3 = val3'
'roup:1: '# inside comment
group:2: 'setting1 = 0;
'
group:1: 'setting1 = 0;'
group:1: 'setting2 = 0;'
Upvotes: 1
Views: 605
Reputation: 5439
According to the documentation, the first element of the GroupCollection
is the entire match, not the first group created by ()
.
From near the bottom of the Remarks section here:
If the regular expression engine can find a match, the first element of the GroupCollection object returned by the Groups property contains a string that matches the entire regular expression pattern. Each subsequent element > represents a captured group, if the regular expression includes capturing groups.
Due to this, both items 0 and 1 are identical given the RegEx you are currently using. To only see the actual group matches, you could skip the first element of the GroupCollection
, and only process the groups you have defined in the RegEx.
After investigating the additional data, I think I may have found the cause of your duplicates.
I believe that you are seeing more than one Match
, and so the outer foreach
loop runs twice, not once. This is because there are 2 separate lines with "= 0;" in the example.
Here is LinqPad example code that shows 2 matches being found, and therefore multiple duplicate groups being output. (note, I used the simple regex you provided to test, since the long regex didn't provide any matches)
static string inputStr = "# preceding comment \r\n" +
"class\r\n" +
"{\r\n" +
" (param1 = \"val1\", param2 = \"val2\", param3 = val3)\r\n" +
"}\r\n" +
"[\r\n" +
" # inside comment\r\n" +
" setting1 = 0;\r\n" +
" setting2 = 0;\r\n" +
"]\r\n";
const string REGEX = "(\\= [0]\\;)";
void Main()
{
var regex = new System.Text.RegularExpressions.Regex(REGEX);
MatchCollection matches = regex.Matches(inputStr);
Console.WriteLine("Matches:{0}", matches.Count);
int matchCnt = 0;
foreach (Match m in matches)
{
int groupCnt = 0;
foreach (Group g in m.Groups)
{
Console.WriteLine("match[{0}] group[{1}]: Captures:{2} '{3}'", matchCnt, groupCnt, g.Captures.Count, g);
//g.Dump();
groupCnt++;
}
matchCnt++;
}
Console.WriteLine("Done!");
}
And here is the output generated by LinqPad when this code runs:
Matches:2
match[0] group[0]: Captures:1 '= 0;'
match[0] group[1]: Captures:1 '= 0;'
match[1] group[0]: Captures:1 '= 0;'
match[1] group[1]: Captures:1 '= 0;'
Done!
Upvotes: 2