bloodphp
bloodphp

Reputation: 117

Regex doesn't match, greediness

I try to match two parts in a string with a regex in PHP. There is a problem with the greediness, I think. I would like the first regex (see comment) to give me the first two captures, as the second regex, but still capture both strings. What am I doing wrong?

I'm trying to get +123 (if cd: exists, as in first string) and 456.

<?php

$data[] = 'longstring start waste cd:+123yz456z longstring';
$data[] = 'longstring start waste +yz456z longstring';
$regexs[] = '/start[^z]*?(cd:([^y]+)y)?[^z]*z([^z]*)z/'; // first
$regexs[] = '/start[^z]*?(cd:([^y]+)y)[^z]*z([^z]*)z/';  // second

foreach ($regexs as $regex) {
  foreach ($data as $string) {
    if (preg_match($regex, $string, $match)) {
      echo "Tried '$regex' on '$string' and got " . implode(',', array_split($match, 1));
      echo "\n";
    }
  }
}
?>

Output is:

Tried '/start[^z]*?(cd:([^y]+)y)?[^z]*z([^z]*)z/' on 'longstring start waste cd:+123yz456z longstring' and got ,,456
Tried '/start[^z]*?(cd:([^y]+)y)?[^z]*z([^z]*)z/' on 'longstring start waste +yz456z longstring' and got ,,456
Tried '/start[^z]*?(cd:([^y]+)y)[^z]*z([^z]*)z/' on 'longstring start waste cd:+123yz456z longstring' and got cd:+123y,+123,456

There is no fourth line since cd: is not present in the second string.

Expected output (since I'm no expert), where the first line differs from actual output:

Tried '/start[^z]*?(cd:([^y]+)y)?[^z]*z([^z]*)z/' on 'longstring start waste cd:+123yz456z longstring' and got cd:+123y,+123,456
Tried '/start[^z]*?(cd:([^y]+)y)?[^z]*z([^z]*)z/' on 'longstring start waste +yz456z longstring' and got ,,456
Tried '/start[^z]*?(cd:([^y]+)y)[^z]*z([^z]*)z/' on 'longstring start waste cd:+123yz456z longstring' and got cd:+123y,+123,456

Upvotes: 1

Views: 162

Answers (1)

Chriszuma
Chriszuma

Reputation: 4557

Okay, so you want to capture +123 if there is a cd:, and always 456? Here's how I would do it:

$data[] = 'longstring start waste cd:+123yz456z longstring';
$data[] = 'longstring start waste +yz456z longstring';

$regexs[] = '/start.+?(?:cd:(.+?)y)?.*?z(.+?)z/';

With the liberal use of non-greedy (?) multipliers you can get it to do exactly what you want.

Also note the (?:) non-capture group. They are very useful.

EDIT Apparently that doesn't work, let's try a different approach, with an "either/or" group:

$regexs[] = '/start.+?(?:cd:(.+?)yz(.+?)z|\+yz(.+?)z)/';

Upvotes: 1

Related Questions