tshabala
tshabala

Reputation: 85

mod_rewrite: no ? and # in REQUEST_URI

What I'm trying to do: have pretty URLs in the format 'http://domain.tld/one/two/three', that get handled by a PHP script (index.php) by looking at the REQUEST_URI server variable.
In my example, the REQUEST_URI would be '/one/two/three'. (Btw., is this a good idea in general?)

I'm using Apache's mod_rewrite to achieve that.
Here's the RewriteRule I use in my .htaccess:

RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]

This works really well thus far; it forwards every REQUEST_URI that consists of a-z, A-Z or a '/' to /index.php, where it is processed.

Only drawback: '?' (question marks) and '#' (hash keys) seem to still be allowed in the REQUEST_URI, maybe even more characters that I've yet to find.
Is it possible to restrict those via my .htaccess and an adequate addition to the RewriteRule?

Thanks!

Upvotes: 2

Views: 2481

Answers (4)

cEz
cEz

Reputation: 5062

The fragment identifer, e.g. #some-anchor, is controlled by the browser, not the server. JavaScript would be needed to redirect and remove this, although why you would want to do so I am not sure.

[SNIPPED after clarification] To rewrite only when the query string is empty:

RewriteCond %{QUERY_STRING} ^$
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]

Upvotes: 1

Gumbo
Gumbo

Reputation: 655239

In mod_rewrite and PHP the variable REQUEST_URI refers to two different part of the URI. In mod_rewrite, %{REQUEST_URI} contains the current URI path; in PHP, $_SERVER['REQUEST_URI'] contains the URI path and query. But in both cases the URI fragment as this part of the URI is not transmitted to the server but only used by the client.

So, when /one/two/three?foo#bar is requested, mod_rewrite’s %{REQUEST_URI} contains /one/two/three and PHP’s $_SERVER['REQUEST_URI'] contains /one/two/three?foo.

Upvotes: 2

Adam Lukaszczyk
Adam Lukaszczyk

Reputation: 4926

If i understand, you want to forbid using of ? and # for your site?

You shouldn't do that, because:

  • hash (#) is used in AJAX URLs google specification,
  • question mark (?) is used for example in Google AdWords and Analytics or any Affiliation Program,

So if you force Apache to reject url request containing question mark, people who click on your Ad in AdWords will only see 404 error page.

There is nothing bad in letting people to use both of them. The case is to prevent your site against XSS attacks.

Btw. there is another very importand sign - percent (%) which is used to encode special chars (like Polish or German national letters)

Upvotes: 0

Tim Stone
Tim Stone

Reputation: 19169

The $_SERVER['REQUEST_URI'] variable will contain the original REQUEST_URI as received by the server, before you perform the rewrite. Therefore it's impossible (as far as I know this early in the morning) to remove the query string portion from the REQUEST_URI's attribute, but you naturally have the option of removing it when you process the $_SERVER['REQUEST_URI'] variable in your script.

If you want to only perform your RewriteRule when the query string is not specified, the following should work:

RewriteCond %{QUERY_STRING} !^.+$
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]

Note that this might be problematic though, since if there's accidentally a query string in a URL that someone uses to link to your site, your script wouldn't be handling it (since the rewrite never happens), so they'll get a 404 response (or whatever the case may be) that might not be as user-friendly as if you had just chosen to silently ignore the trailing information.

Upvotes: 0

Related Questions