Reputation: 1364
VB2012: I have a string I want to parse out. It has a fixed start and end string but inside there are repetitive strings.
Input string looks like this with much more of the same type of data between START and END.
START;data[0][1]="2000";data[0][2]="2015-09-25";data[0][3]="XYZ";END;
My current regex looks like this
(data\[(?<row>\d{1,2})]\[(?<col>\d{1,2})]="(?<val>.*?)";)
That works great and matches the repetitive strings inside:
Match Number Match Text Group 1 row col val
0 "data[0][1]=""2000"";" "data[0][1]=""2000"";" "0" "1" "2000"
1 "data[0][2]=""2015-09-25"";" "data[0][2]=""2015-09-25"";" "0" "2" "2015-09-25"
2 "data[0][3]=""XYZ"";" "data[0][3]=""XYZ"";" "0" "3" "XYZ"
I want to make the match a bit more accurate by matching the START string, then repetitive strings, then and END string. My attempt has been of the form:
START;(data\[(?<row>\d{1,2})]\[(?<col>\d{1,2})]="(?<val>.*?)";)*END;
But that gives me an output where the different groups are on their own and not part of a bigger match. I'm stuck on what I should try.
Upvotes: 0
Views: 1897
Reputation: 51350
Let's take your example:
START;data[0][1]="2000";data[0][2]="2015-09-25";data[0][3]="XYZ";END;
along with your second regex:
START;(data\[(?<row>\d{1,2})]\[(?<col>\d{1,2})]="(?<val>.*?)";)*END;
So, what do we get here?
The pattern is wrapped in START;(
...[values]...)*END;
, and you're using a *
quantifier. There are further capture groups in the [values] part.
So, a match looks like this:
START;data[0][1]="2000";data[0][2]="2015-09-25";data[0][3]="XYZ";END;
R C VVVV R C VVVVVVVVVV R C VVV <-- groups
\________________/\______________________/\_______________/ <-- [values]
\___________________________________________________________________/ <-- full match
The [values] part of the regex matches 3 times. R
is the value captured by the row
group, C
is what's captured by col
, and VVV
is what's captured by val
.
In such a case, most other regex engines would throw away all but the last capture, and you'd get only the values 0
, 3
and XYZ
from your match.
But .NET supports multiple captures per group. So you can extract all the captured substrings, for each iteration of the enclosing *
quantifier.
Match.Groups
corresponds to a capture group in the pattern (e.g. the (?<row>
...)
group).Match.Groups("row").Captures
corresponds to a given capture in an iteration of a quantifier during the match.Which means, when a given capture group is used several times during a match, you'll get several captures for it.
Contrast it with the first regex:
(data\[(?<row>\d{1,2})]\[(?<col>\d{1,2})]="(?<val>.*?)";)
Let's look at the matches:
START;data[0][1]="2000";data[0][2]="2015-09-25";data[0][3]="XYZ";END;
R C VVVV R C VVVVVVVVVV R C VVV <-- groups
\________________/\______________________/\_______________/ <-- whole matches
Each match has only one capture instance for each capturing group.
Upvotes: 1