TheUnexpected
TheUnexpected

Reputation: 3167

RegEx: grouping returns only the last match

I've a set of strings in this form:

NOOO (2), { AAA (1), BBB (2), CCC-CC (3), DDD (4) }

(elements can be more than four inside the brackets)

I need to match the contents inside the brackets and extract (using groups) only the 'AAA', 'BBB', ... substrings. So the result for this example will be

group1 : AAA
group2 : BBB
group3 : CCC-CC
group4 : DDD

I tried with this expression:

\{ (?:(\S+) \(\d+\),?\s?)+ \}

But it returns only the last matched group (so, in this case, only 'DDD'). What am I missing? Thanks

Upvotes: 2

Views: 3003

Answers (1)

Qtax
Qtax

Reputation: 33908

If you are using .NET regex then your expression will work as the capturing group will capture all its values. Otherwise you have to use a more tricky regex or match this in two steps, first matching the { ... } group and then the elements in it.

The tricky regex would look like:

(?:{|\G(?!^),)   # match a { or where the previous match ended followed by a ,
\s+              # space between elements
(\S+)\s+\(\d+\)  # an element
(?=[^{]*})       # make sure it's eventually followed by a }

You can use that expression as it's written if you use the /x flag (can also be set by adding (?x) in the beginning of the expression).

The regex without the comments:

(?:{|\G(?!^),)\s+(\S+)\s+\(\d+\)(?=[^{]*})

This expression uses \G which your regex flavor has to support. Most modern regex flavors have it, including: Perl, PCRE (PHP/etc), .NET.

Note that such an expression isn't perfect. It would capture AAA and BBB in the following string for example:

{ AAA (1), BBB (23), CCC, something invalid here #¤% ))),,,,!! }

Altho that can be fixed if required (except for the counter).

Upvotes: 3

Related Questions