Archie
Archie

Reputation: 2579

Regex to get all possible matches for a pattern in C#

I'm learning regex and need to get all possible matches for a pattern out of a string.

If my input is:

case a
when cond1 
then stmt1;
when cond2 
then stmt2;
end case;

I need to get the matches which have groups as follows

Group1:

  1. "cond1"
  2. "stmt1;"

and Group2:

  1. "cond2"
  2. "stmt2;"

Is it possible to get such groups using any regex?

Upvotes: 1

Views: 18014

Answers (3)

user1228
user1228

Reputation:

I don't think this is possible, primarily because any group that matches when...then... is going to match all of them, creating multiple captures within the same group.

I'd suggest using this regex:

(?:when(.*)\nthen(.*)\n)+?

which results in:

Match 1:
* Group 1: cond1
* Group 2: stmt1;
Match 2:
* Group 1: cond2
* Group 2: stmt2;

Upvotes: 1

Spoike
Spoike

Reputation: 121772

If this was written in java I would write two patterns for the parser, one to match the cases and one to match the when-then cases. Here is how the latter could be written:

CharSequence buffer = inputString.subSequence(0, inputString.length());
// inputString is the string you get after matching the case statements...

Pattern pattern = Pattern.compile(
    "when (\\S+).*"
    + "then (\\S+).*");

Matcher matcher = pattern.matcher(buffer);
while (matcher.find()) {
    DoWhenThen(matcher.group(1), matcher.group(2));
}

Note: I haven't tested this code as I'm not 100% sure on the pattern... but I'd be tinkering around this.

Upvotes: 0

rslite
rslite

Reputation: 84683

It's possible to use regex for this provided that you don't nest your statements. For example if your stmt1 is another case statment then all bets are off (you can't use regex for something like that, you need a regular parser).

Edit: If you really want to try it you can do it with something like (not tested, but you get the idea):

Regex t = new Regex(@"when\s+(.*?)\s+then\s+(.*?;)", RegexOptions.Singleline)
allMatches = t.Matches(input_string)

But as I said this will work only for not nested statements.

Edit 2: Changed a little the regex to include the semicolon in the last group. This will not work as you wanted - instead it will give you multiple matches and each match will represent one when condition, with the first group the condition and the second group the statement.

I don't think you can build a regex that does exactly what you want, but this should be close enough (I hope).

Edit 3: New regex - should handle multiple statements

Regex t = new Regex(@"when\s+(.*?)\s+then\s+(.*?)(?=(when|end))", RegexOptions.Singleline)

It contains a positive lookahead so that the second group matches from then to the next 'when' or 'end'. In my test it worked with this:

case a
when cond1 
then stmt1;
   stm1;
   stm2;stm3
when cond2 
then stmt2;
   aaa;  
   bbb;
end case;

It's case sensitive for now, so if you need case insensitivity you need to add the corresponding regex flag.

Upvotes: 6

Related Questions