Reputation: 3167
I've a set of strings in this form:
NOOO (2), { AAA (1), BBB (2), CCC-CC (3), DDD (4) }
(elements can be more than four inside the brackets)
I need to match the contents inside the brackets and extract (using groups) only the 'AAA', 'BBB', ... substrings. So the result for this example will be
group1 : AAA
group2 : BBB
group3 : CCC-CC
group4 : DDD
I tried with this expression:
\{ (?:(\S+) \(\d+\),?\s?)+ \}
But it returns only the last matched group (so, in this case, only 'DDD'). What am I missing? Thanks
Upvotes: 2
Views: 3003
Reputation: 33908
If you are using .NET regex then your expression will work as the capturing group will capture all its values. Otherwise you have to use a more tricky regex or match this in two steps, first matching the { ... }
group and then the elements in it.
The tricky regex would look like:
(?:{|\G(?!^),) # match a { or where the previous match ended followed by a ,
\s+ # space between elements
(\S+)\s+\(\d+\) # an element
(?=[^{]*}) # make sure it's eventually followed by a }
You can use that expression as it's written if you use the /x
flag (can also be set by adding (?x)
in the beginning of the expression).
The regex without the comments:
(?:{|\G(?!^),)\s+(\S+)\s+\(\d+\)(?=[^{]*})
This expression uses \G
which your regex flavor has to support.
Most modern regex flavors have it, including: Perl, PCRE (PHP/etc), .NET.
Note that such an expression isn't perfect. It would capture AAA
and BBB
in the following string for example:
{ AAA (1), BBB (23), CCC, something invalid here #¤% ))),,,,!! }
Altho that can be fixed if required (except for the counter).
Upvotes: 3