Joe
Joe

Reputation: 447

Matching multiple sporadic groups in a regex

I'm dealing with some legacy code that stores its data in a proprietary string format and I'm trying to create a regex to make parsing this format much easier.

What I'm having trouble with is the format contains groups that can be repeated many times sporadically. For example typically the data will look liks this (A)(B)(B)(B), but sometimes it can have multiple (A)'s like this (A)(B)(B)(B)(A)(B)(B), or even (A)(A)(B)(B)(B). The number of repetitions of (B) can vary too, as few as none or as many as well, lots.

What's happening is my current regex works fine when the data looks like (A)(B)(B)... but it breaks when there is another (A) later on in the string. The first (A) gets caught, but all remaining (A)'s don't.

So basically right now I have a regex that has a group for parsing (A)'s and a group for parsing (B)'s and these groups work fine independently, but I can't figure out how to combine these with the correct repetition syntax between them so that dispersed matches get found, instead of only the first one and the rest being ignored.

Am I just missing something or do I have to break my regex up into two separate ones and parse out (A)'s and (B)'s separately? (I'm using C#/.Net)

Upvotes: 2

Views: 425

Answers (3)

Jesus Kevin Morales
Jesus Kevin Morales

Reputation: 123

I would place each idividual part of what you want to match in their own groups, putting a | charachter. Then, I would write a function/method using a switch statement. In this way, you can match group 1 or 2... and react to the different results.

Upvotes: 0

HBP
HBP

Reputation: 16063

It would help to see your current regexp.

To match any sequence of A's or B's use the following

           (A*B*)*

That any number of groups of of A's followed by any number of B's

This will match the empty string, to ensure there is at least some data :

           (A|B)(A*B*)*

Or is data always starts with an A (as in all your examples)

            A(A*B*)*   

Upvotes: 0

Welbog
Welbog

Reputation: 60458

If you have a working pattern that matches (A) and another that matches (B), then the expression to match any number of either is

(?:(A)|(B))*

There's no need to get fancy if that's all you need. This expression matches either (A) or (B) any number of times, but leaves the capturing of the groups to the A and B level.

Upvotes: 4

Related Questions