codekandis
codekandis

Reputation: 806

RegEx force retrieving an empty named group

I'm implementing a router class resolving some URI patterns to route properly to a controller.

Assuming following simple test cases and my expectations

/test                              -> should match with ID ""
/test/                             -> should match with ID ""
/test/some-id                      -> should match with ID "some-id"
/test/some-id/                     -> should match with ID "some-id/"
/test/some-id/trailing-id-data     -> should match with ID "some-id/trailing-id-data"
/test/some-id/trailing-id-data/    -> should match with ID "some-id/trailing-id-data/"
/test-someid                       -> should not match

Important is to match the cases 1 and 2 with an optional trailing slash. Both should return an empty ID (I'm not really sure if it's possible with RegEx, my guts are saying otherwise). But if a trailing ID is present the cases 3 to 6 should match. The 7th case is a false one.

I used the following script testing my RegEx

<?php
$pattern = '...';
$values = [
  '/test',
  '/test/',
  '/test/some-id',
  '/test/some-id/',
  '/test/some-id/trailing-id-data',
  '/test/some-id/trailing-id-data/',
  '/test-some-id'
];
foreach ( $values as $value )
{
  $matches = [];
  var_export( ( bool ) preg_match( $pattern, $value, $matches ) );
  echo "\n";
  var_export( $matches );
  echo "\n\n";
}

First without expecting an empty ID I tried several RegEx and specified the following one as working

/^\/test(\/?|(\/(?<id>.*)))$/

true
array (
  0 => '/test',
  1 => '',
)

true
array (
  0 => '/test/',
  1 => '/',
)

true
array (
  0 => '/test/some-id',
  1 => '/some-id',
  2 => '/some-id',
  'id' => 'some-id',
  3 => 'some-id',
)

true
array (
  0 => '/test/some-id/',
  1 => '/some-id/',
  2 => '/some-id/',
  'id' => 'some-id/',
  3 => 'some-id/',
)

true
array (
  0 => '/test/some-id/trailing-id-data',
  1 => '/some-id/trailing-id-data',
  2 => '/some-id/trailing-id-data',
  'id' => 'some-id/trailing-id-data',
  3 => 'some-id/trailing-id-data',
)

true
array (
  0 => '/test/some-id/trailing-id-data/',
  1 => '/some-id/trailing-id-data/',
  2 => '/some-id/trailing-id-data/',
  'id' => 'some-id/trailing-id-data/',
  3 => 'some-id/trailing-id-data/',
)

false
array (
)

Now in the second place I'm struggling with retrieving an empty ID.

Is it generally possible and if so, how?

Upvotes: 0

Views: 357

Answers (2)

mickmackusa
mickmackusa

Reputation: 47874

This pattern will accurately match your strings as desired:

Pattern: ~^/test(?:$|/)\K.*~ in just 81 steps (compared to Marcos' pattern @ 114steps)

This will match /test then if there are any characters to follow, it must begin with /, then zero or more characters are matched. The \K means "start the full match from this point", so you can avoid using a capture group and just access the full string match elements from the output array that preg_match() generates.

Pattern Demo

PHP Code: (PHP Demo)

$values = [
  '/test',
  '/test/',
  '/test/some-id',
  '/test/some-id/',
  '/test/some-id/trailing-id-data',
  '/test/some-id/trailing-id-data/',
  '/test-some-id'
];
foreach($values as $v){
    echo "$v -> ";
    var_export(preg_match("~^/test(?:$|/)\K.*~",$v,$out)?$out:'failed');
    echo "\n";
}

Output:

/test -> array (0 => '')
/test/ -> array (0 => '')
/test/some-id -> array (0 => 'some-id')
/test/some-id/ -> array (0 => 'some-id/')
/test/some-id/trailing-id-data -> array (0 => 'some-id/trailing-id-data')
/test/some-id/trailing-id-data/ -> array (0 => 'some-id/trailing-id-data/')
/test-some-id -> 'failed'

Upvotes: 3

Marcos Dimitrio
Marcos Dimitrio

Reputation: 6852

Given your test cases, you could use something like this (see it online):

^\/test(?:\/?$|\/(?:(?<id>.*))$)

The trick was using the line anchor $ and alternation |, so in (\/?$|\/... it would match an end of line preceded by an optional slash or match the other side of the pattern.

I also used non-capturing groups (?:) to prevent filling the array with unneeded matches.

Lastly, I agree with @ArtisticPhoenix, perhaps there's a better approach for interpreting routes than using Regex, specially since such a small requirement becomes a rather complicated expression. Something like MINI may give you some inspiration.

Upvotes: 0

Related Questions