dani jinji
dani jinji

Reputation: 471

php regular expressions groups

I am new to php and regex, and I am facing a problem.

I have a text looks like this for example: "FIRST NAME: sdfksdfkjskdlfjlskdf MORE DATA: dsfkskldfjsdkfjsdkf EXTRA DATA: dsfksdfjlsdjfklsdf"

and I want to fecth three groups, one for FIRST NAME another for MORE DATA and a third one for EXTRA DATA. so this is my regex:

FIRST NAME:(.*)MORE DATA:(.*)EXTRA DATA:(.*)

this is how I do it in java.

but now, how do I match it with php so I can echo something like:

echo "more data: " . matche(group(1));

for example.

thank you!

EDIT: and what if I have it repeating it self? for example: FIRST NAME: sdfksdfkjskdlfjlskdf MORE DATA: dsfkskldfjsdkfjsdkf EXTRA DATA: dsfksdfjlsdjfklsdf FIRST NAME: sdfksdfkjskdlfjlskdf MORE DATA: dsfkskldfjsdkfjsdkf EXTRA DATA: dsfksdfjlsdjfklsdf how do I grap the groups in a while loop until it ends matching?

Upvotes: 4

Views: 3025

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

Your regex is designed to only find one match per no-newline string in any regex engine because the last .* matches any chars, 0 or more times.

If your input always has these 3 parts repeated any number of times, use lazy dot matching and add a positive lookahead after the last one to make sure we match up to the end of string or the FIRST NAME::

FIRST NAME:(.*?)MORE DATA:(.*?)EXTRA DATA:(.*?)(?=$|FIRST NAME:)
           ^^^^^          ^^^^^           ^^^^^ ^^^^^^^^^^^^^^^^

See this regex demo

Here, (.*?) matches 0+ any chars other than a newline as few times as possible, and (?=$|FIRST NAME:) zero-width assertion requires .*? matching up to what comes first: either end of string or FIRST NAME:.

PHP note: in Java, you use a Matcher#find to find partial matches and run a while loop. In PHP, you just use preg_match_all:

$re = "/FIRST NAME:(.*?)MORE DATA:(.*?)EXTRA DATA:(.*?)(?=$|FIRST NAME:)/"; 
$str = "FIRST NAME: sdfksdfkjskdlfjlskdf MORE DATA: dsfkskldfjsdkfjsdkf EXTRA DATA: dsfksdfjlsdjfklsdf FIRST NAME: sdfksdfkjskdlfjlskdf MORE DATA: dsfkskldfjsdkfjsdkf EXTRA DATA: dsfksdfjlsdjfklsdf"; 
preg_match_all($re, $str, $matches);
print_r($matches[1]); // Print Group 1
print_r($matches[2]); // Print Group 2
print_r($matches[3]); // Print Group 3 

And if you are a regex optimization fan, unroll the lazy matching groups:

FIRST NAME:([^M]*(?:M(?!ORE DATA:)[^M]*)*)MORE DATA:([^E]*(?:E(?!XTRA DATA:)[^E]*)*)EXTRA DATA:([^F]*(?:F(?!IRST NAME:)[^F]*)*)

See the regex demo

Upvotes: 0

Will
Will

Reputation: 24699

Let's do it like this:

preg_match('/FIRST NAME:\s*(.*?)\s*MORE DATA:\s*(.*?)\s*EXTRA DATA:\s*(.*)\s*/', $line, $matches);

Your match results will now be in $matches, like this:

php > var_dump($matches);
array(4) {
  [0]=>
  string(93) "FIRST NAME: sdfksdfkjskdlfjlskdf MORE DATA: dsfkskldfjsdkfjsdkf EXTRA DATA: dsfksdfjlsdjfklsd"
  [1]=>
  string(20) "sdfksdfkjskdlfjlskdf"
  [2]=>
  string(19) "dsfkskldfjsdkfjsdkf"
  [3]=>
  string(17) "dsfksdfjlsdjfklsd"
}
php >

So now $matches[1] is the first group, and so on. [0] is the whole match result.

Upvotes: 3

Miloš Đakonović
Miloš Đakonović

Reputation: 3871

Based strictly on your input:

$re = "/(FIRST NAME\\s*:)\\s*(.*)(MORE DATA\\s*:\\s*)(.*)(EXTRA DATA\\s*:\\s*)(.*)/"; 
$str = "FIRST NAME: sdfksdfkjskdlfjlskdf MORE DATA: dsfkskldfjsdkfjsdkf EXTRA DATA: dsfksdfjlsdjfklsdf"; 

preg_match_all($re, $str, $matches);

Then check $matches variable. This way you will have groups with matches: FIRST NAME:first-name-value, MORE DATA: more-data-value, EXTRA DATA: extra-data-value.

Upvotes: 1

Related Questions