DaveRandom
DaveRandom

Reputation: 88657

Recursive path component collection with mod_rewrite

I am trying to do something that has been done many times before by many people but I just can't seem to get this working. I have been trying for nearly 2 days now, I have been trawling the internet for a working example and found many very similar SO questions but none of them are working for me - most of them are after a key/value approach and I just want a list of values.


What I want:

I want to be able to use search engine friendly URLs. Because of the nature of the way the site in question currently works, I want to convert this request URI:

/this/is/a/random/path

...to:

/index.php?p[]=this&p[]=is&p[]=a&p[]=random&p[]=path

So that when it arrives in PHP it will be available as an indexed array in $_GET['p']. I also want it too be tolerant of a trailing slash, so I would get the same result from:

/this/is/a/random/path/

How I have tried to do it:

I am not too bad with regex, and I have a reasonable understanding of how mod_rewrite works but I think I have disappeared so far up the wrong road that I can no longer see the way back.

Here is what I have currently:

# Turn mod_rewrite on
RewriteEngine On

# Allow direct loading of files in the /static directory
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^/?static/(.+)$ - [L]

# Recursively capture all path components
RewriteCond %{REQUEST_URI} !^/?(?:index\.php)?$
RewriteRule ^/?([^/]+)(?:/(.+)$|/?$) $2?p[]=$1 [QSA,L]

# Send request to controller
RewriteRule ^.*$ index.php [QSA]

What's wrong:

The first RewriteCond/RewriteRule pair works nicely - if I request a file that exists in the /static directory the request is left as it is and the file is served. If the file doesn't exist it falls through to the second set of rules so that I can display one of my sexy PHP-based error pages.

The problem lies with the second RewriteCond/RewriteRule pair, and possibly the third RewriteRule as well. The condition is supposed to be there to ensure that the final iteration does not cause the script name to be added to the array - and this seems to work. Here's what I think that second RewriteRule is doing, I suspect I've missed something obvious here:

           ^/? # Start of string with optional leading slash
       ([^/]+) # Capture all characters up to next slash
(?:/(.+)$|/?$) # Either grab all characters after the next slash or match the end

     $2?p[]=$1 # Push captured path component onto the array and shift URI down
       [QSA,L] # Merge previous query string, continue to next iteration

This is 90% working. Problems that I have:

Can anyone shed any light on why this is happening, or suggest a better way to do this?

Upvotes: 0

Views: 421

Answers (1)

Wrikken
Wrikken

Reputation: 70490

Wouldn't this be infinitely easier:

 RewriteCond ${REQUEST_FILENAME} !-f
 RewriteCond ${REQUEST_FILENAME} !-d
 RewriteRule .* rewrite.php [L]

rewrite.php:

 <?php
 $p = array_filter(explode('/',parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH)));
 // you _could_ of course do an EVIL $_GET['p'] = $p, but I prefer to leave 
 // the superglobals 'read-only'. Not touching $_GET does however mean
 // that index.php needs to be altered somewhat, allowing for a check on isset($p) 
 // and using that as input
 include 'index.php';
 ?>

Rewriting in apache is all well and good, but often just parsing & determining actions in PHP itself is a lot easier, and also easier to maintain / alter later on.

Questions / Remarks:

Your htaccess will allow direct access to files if I request them by path, which I don't want to do I don't want to do unless they are in the /static

It does not allow any more or less access then is there at this time. With only your index.php & rewrite.php reachable anything else can be outside the document root, where files should reside you don't want to allow access to. Unless you are using this input to blindly include files in your index.php.... I DID miss the part about request for existing files that should also be piped to index.php. In that case, something like this would do:

RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^/?static/(.+)$ - [L]

RewriteCond ${REQUEST_URI} !^/?(index\.php)?$
RewriteRule .* rewrite.php [L,QSA]

By the way, what's the array_filter() with no callback for? So far as I can see all it will do is strip empty components and 0 components, and I would probably want to allow 0s.

It was to prevent empty 'ghosts' like resulting from erroneous urls like /foo//bar (notice the double //.

Would preg_split('#/+#', $str, -1, PREG_SPLIT_NO_EMPTY); be better?

If you want to allow 0 / other stuff that is filtered by array_filter, then yes, that solution would be better.

Upvotes: 1

Related Questions