Alex M
Alex M

Reputation: 1304

.NET Regex: Get all matched groups

I've stuck with regex problem using .NET. For example, I have next regex pattern: (?'group1'A|C)|(?'group2'B|C)|(?'group3'A|B|C)

When I do match of "AXYZ" I receive match object which contains Value and Groups; if I go to Groups I'll see that only one group has success in true - group1 (group3 is in false). If I do match of "BXYZ" I'll receive only group2 with success in true (group3 is in false).

How could I receive in match not only one group but all groups satisfying the match?

For example above it should be: group1 & group3 in "AXYZ" and group2 & group3 in "BXYZ".

All above is only example in real system there are different patterns (3+ letters each) and more complicated input text (1000+ words).

Upvotes: 1

Views: 2800

Answers (4)

Kobi
Kobi

Reputation: 138137

The question seems a little abstract, but if you insist on a single regex you can do something like this, using optional lookaheads:

(?=(?'group1'A|C)?)(?=(?'group2'B|C)?)(?=(?'group3'A|B|C)?)

Lookaheads match but don't capture, so your match will be empty in this case, but the groups will be as you expect, and may overlap.

Working example: http://ideone.com/PTtQu

Upvotes: 2

robyaw
robyaw

Reputation: 2320

The regex you have there will only match single characters; as soon as a match has been found on a character, the regex moves onto the next character in the input string. In your example, 'B' will never be matched by 'group2' or 'group3' as it will always be matched by 'group1'. Similarly, 'A' will never be matched by 'group3' for the same reason.

One way of getting the outcome you require using regexes is to treat each group as a separate regex and use Regex.IsMatch() on each one. For counts, the following C# does what I think you're asking for:

string input = "AXYZ";
int count = 0;

count += Regex.IsMatch(input, "A|B") ? 1 : 0;
count += Regex.IsMatch(input, "B|C") ? 1 : 0;
count += Regex.IsMatch(input, "A|B|D") ? 1 : 0;

Console.WriteLine(count); // returns 2

Upvotes: 1

Ahmad Mageed
Ahmad Mageed

Reputation: 96557

The regex engine is eager, which means it will always return the left-most match and stop matching once a match is found. To demonstrate, consider this sample:

string input = "Hello World";
string pattern = "Hello|Hello World";
Console.WriteLine(Regex.Match(input, pattern).Value);
pattern = "Hello World|Hello";
Console.WriteLine(Regex.Match(input, pattern).Value);

In your case group1 is matched first, so all other groups will not match and return false. Also, you claim that "BXYZ" returns group2, but this can't be right. Both "AXYZ" and "BXYZ" get matched by group1: (?'group1'A|B). If you have a need to test each group you'll need to do so using a separate regex.

Upvotes: 0

Adam Dymitruk
Adam Dymitruk

Reputation: 129762

I believe you have to make the regex "greedy". Here is some info on it:

http://blogs.msdn.com/b/ericgu/archive/2005/08/19/453869.aspx

Upvotes: 0

Related Questions