Zwo
Zwo

Reputation: 1113

Regex capture group multiple times and other groups

I'm trying to make a regex expression which capture multiple groups of data.

Here is some data example :

sampledata=X
B : xyz=1 FAB1_1=03 FAB2_1=01
A : xyz=1 FAB1_1=03 FAB2_1=01

I need to capture the X which should appear one time, and FAB1_1=03, FAB2_1=01, ... All the strings which starts with FAB.

So, I could capture all "FAB" like this :

/(FAB[0-9]_[0-9]=[0-9]*)/sg

But I could not include the capture of X using this expression :

/sampledata=(?<samplegroup>[0-9A-Z]).*(FAB[0-9]_[0-9]=[0-9]*)/sg

This regex only return one group with X and the last match of group of "FAB".

Upvotes: 1

Views: 6461

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

You can use

(?:sampledata=(\S+)|(?!^)\G)(?:(?!FAB[0-9]_[0-9]=).)*(FAB[0-9]_[0-9])=([0-9]*)‌​

See the regex demo

The regex is based on the \G operator that matches either the start of string or the end of the previous successful match. We restrict it to match only in the latter case with a negative lookahead (?!^).

So:

  • (?:sampledata=(\S+)|(?!^)\G) - match a literal sampledata= and then match and capture into Group 1 one or more non-whitespace symbols -OR- match the end of the previous successful match
  • (?:(?!FAB[0-9]_[0-9]=).)* - match any text that is not FABn_n= (this is a tempered greedy token)
  • (FAB[0-9]_[0-9]) - Capture group 2, matching and capturing FAB followed with a digit, then a _, and one more digit
  • = - literal =
  • ([0-9]*)‌​ - Capture group 3, matching and capturing zero or more digits

If you have 1 sampledata= block, you can safely unroll the tempered greedy token (demo) as

(?:sampledata=(\S+)|(?!^)\G)[^F]*(?:F(?!FAB[0-9]_[0-9]=)[^F]*)*?(FAB[0-9]_[0-9])=([0-9]*)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

That way, the expression will be more efficient.

If you have several sampledata blocks, enhance the tempered greedy token:

(?:sampledata=(\S+)|(?!^)\G)(?:(?!sampledata=|FAB[0-9]_[0-9]=).)*(FAB[0-9]_[0-9])=([0-9]*)

See another demo

Upvotes: 3

Related Questions