Regex: captures, groups, confusion

Question

I can't seem to figure out captures + groups in Regex (.net).

Let's say I have the following input string, where each letter is actually a placeholder for more complex regex expression (so simple character exclusion won't work):

CBDAEDBCEFBCD

Or, more generically, here is a string pattern written in 'regex':

(C|B|D)*A(E*)(D|B|C)*(E*)F(B|C|D)*

There will only be one A and one F. I need to capture as individual 'captures' (or matches or groups) all instances of B, C, D (which in my app are more complex groups) that occur after A and before F. I also need A and F. I don't need E. And I don't need the C,B,D before the A or the B,C,D after the F.

I would expect the correct result to be:

Groups["start"] (1 capture) = A
Groups["content"] (3 captures)  
  Captures[0] = D  
  Captures[1] = B
  Captures[2] = C
Groups["end"] (1 capture) = F

I tried a few feeble attempts but none of them worked.

Only "incorrectly" captures the last C before EF in the sample string above (as well as correctly start = A, end = F)

(?<=(?A)).+(?B|C|D).+(?=(?F))

Same results as above (just added a + after (?B|C|D) )

(?<=(?A)).+(?B|C|D)+.+(?=(?F))

Got rid of look-around stuff... same result as above

(?A).+(?B|C|D)+.+(?F)

And then my good-for-nothing brain went on strike.

So, what's the right way to approach this? Are look-arounds really needed for this or not?

Thanks!

Alan Moore · Accepted Answer

Yeah, forget the lookarounds, they just complicate things needlessly. But I suspect your final regex will work if you make that first .+ reluctant:

(?A).+?(?B|C|D)+.+(?F)

EDIT: yep:

string s = "CBDAEDBCEFBCD";
Regex r = new Regex(@"(?A).+?(?B|C|D)+.+(?F)");

foreach (Match m in r.Matches(s))
{
  Console.WriteLine(@"Groups[""start""] = {0}", m.Groups["start"]);
  foreach (Capture c in m.Groups["content"].Captures)
  {
    Console.WriteLine(@"Capture[""content""] = {0}", c.Value);
  }
  Console.WriteLine(@"Groups[""end""] = {0}", m.Groups["end"]);
}

output:

Groups["start"] = A
Capture["content"] = D
Capture["content"] = B
Capture["content"] = C
Groups["end"] = F

Regex: captures, groups, confusion

Answers (2)

Related Questions