Stann
Stann

Reputation: 13948

named groups in PHP pcre regex

Trying to match string like this:

/2011/10/Lorem-ipsum-dolor-it-amet-consectetur-adipisicing
/2011/10/Lorem-ipsum-dolor-it-amet-consectetur-adipisicing/

and

/2011/10/4545
/2011/10/4545/

And get year, month and the third segment back. This is regex I've got:

%/(?P<year>\d{4})/(?P<month>\d{2})/((?P<id>\d{1,})|(?P<permalink>.{1,}))[/]{0,1}$%

I though resulting matches array will always contain 3 variables: year,month and id or permalink. But what happens - if permalink is matched - I also still get empty id variable in the resulting array anyway. Is there a way to rewrite a regex so resulting array will only contain year, month and id or permalink ?

Upvotes: 2

Views: 2829

Answers (3)

Mob
Mob

Reputation: 11098

You don't necesarily need regex.

        $x = "/2011/10/4545";
        $v = explode("/", $x);
        $r = array_shift($v);
        if(count($v) == 4){
             array_pop($v);
             print_r($v);    }

Outputs

Array
(
    [0] => 2011
    [1] => 10
    [2] => 4545

$url = "/2011/10/Lorem-ipsum-dolor-it-amet-consectetur-adipisicing";
    $v = explode("/", $url);
    array_shift($v);
    array_pop($v);
    if(count($v) == 3){
      array_pop($v);
    print_r($v);
} else {

print_r($v); }

Outputs

Array
(
    [0] => 2011
    [1] => 10
)

Upvotes: 1

Jon
Jon

Reputation: 437336

Since they are present in the regex, the named groups will be always included in the match groups even if they did not match anything due to the |.

You may also want to improve the regex a bit, substituting the . in <permalink> with [^/] because you don't want a trailing slash (if present) as part of the permalink.

However, as Mob notes, there's a much easier way to parse such an easy target:

list($year, $target, $link) = array_slice(explode('/', $url), 1);
if (is_numeric($link)) {
    // $link == id
}
else {
    // $link == permalink
}

Upvotes: 1

JJJ
JJJ

Reputation: 33163

I believe named groups aren't "ignored" when using the | syntax because there's no way of knowing whether you want to keep both of the results. In other words, both sides of | are evaluated even when one of them has or doesn't have a match, unlike conditional or in most programming languages.

As an example, if you have a regular expression

/(?P<foo>abc)|(?P<bar>def)/

and the string to compare against is abcdef, in some cases you'd want to know that both subexpressions matched and so both variables should be set. And if both variables are set in some cases, it's better to set them in all cases so that the programmer doesn't first have to check if they've been set before handling them.

And as a comment to the question "Is there a way to rewrite a regex so resulting array will only contain year, month and id or permalink", why would you want that? Just check if the variable is empty. If the regex would leave either of them out, you'd still need a check which of them is set. The exact same logic can be used to check which of them is empty.

Upvotes: 4

Related Questions