Reputation: 5241
I can't seem to figure out captures + groups in Regex (.net).
Let's say I have the following input string, where each letter is actually a placeholder for more complex regex expression (so simple character exclusion won't work):
CBDAEDBCEFBCD
Or, more generically, here is a string pattern written in 'regex':
(C|B|D)*A(E*)(D|B|C)*(E*)F(B|C|D)*
There will only be one A and one F. I need to capture as individual 'captures' (or matches or groups) all instances of B, C, D (which in my app are more complex groups) that occur after A and before F. I also need A and F. I don't need E. And I don't need the C,B,D before the A or the B,C,D after the F.
I would expect the correct result to be:
Groups["start"] (1 capture) = A
Groups["content"] (3 captures)
Captures[0] = D
Captures[1] = B
Captures[2] = C
Groups["end"] (1 capture) = F
I tried a few feeble attempts but none of them worked.
Only "incorrectly" captures the last C before EF in the sample string above (as well as correctly start = A, end = F)
(?<=(?<start>A)).+(?<content>B|C|D).+(?=(?<end>F))
Same results as above (just added a + after (?B|C|D) )
(?<=(?<start>A)).+(?<content>B|C|D)+.+(?=(?<end>F))
Got rid of look-around stuff... same result as above
(?<start>A).+(?<content>B|C|D)+.+(?<end>F)
And then my good-for-nothing brain went on strike.
So, what's the right way to approach this? Are look-arounds really needed for this or not?
Thanks!
Upvotes: 0
Views: 218
Reputation: 75232
Yeah, forget the lookarounds, they just complicate things needlessly. But I suspect your final regex will work if you make that first .+
reluctant:
(?<start>A).+?(?<content>B|C|D)+.+(?<end>F)
EDIT: yep:
string s = "CBDAEDBCEFBCD";
Regex r = new Regex(@"(?<start>A).+?(?<content>B|C|D)+.+(?<end>F)");
foreach (Match m in r.Matches(s))
{
Console.WriteLine(@"Groups[""start""] = {0}", m.Groups["start"]);
foreach (Capture c in m.Groups["content"].Captures)
{
Console.WriteLine(@"Capture[""content""] = {0}", c.Value);
}
Console.WriteLine(@"Groups[""end""] = {0}", m.Groups["end"]);
}
output:
Groups["start"] = A
Capture["content"] = D
Capture["content"] = B
Capture["content"] = C
Groups["end"] = F
Upvotes: 2
Reputation: 15799
Since you said all instance of C,B,D
, I would think you'd want to use a grouping for that [CBD]*
Also, if you're just looking for something to be after the letter A
but before F
, then you should be able to use those literals along with some exclusions.
Here's a pattern I came up with. Group $4
should contain the letter DBC
([^A]*)(A)([^CBDF]*)([CBD]*)([^F]*)(F)(.*)
Here's an example of this pattern in action.
The question is, what do you want if the original string is CBDAEDEBECEFBCD
?
Upvotes: 0