obento not ubuntu
obento not ubuntu

Reputation: 51

Remove all query strings except for those already rewritten

I have all query strings which I need already rewritten to SEO friendly URLs, like

RewriteRule ^item_([0-9]+)/$ database.php?type=product&id=$1 [L]
RewriteRule ^post_([0-9]+)/$ articles.php?id=$1 [L]
... and so on

but I would like to strip any other query strings like item_123/?foo=bar or database.php?foo=bar or post_123/?type=product&id=321 for both SEO and security reasons.

The apparently obvious solution of placing

RewriteCond %{QUERY_STRING} (.+)
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

in the end of .htaccess to deal with everything that has not bean dealt before and stopped by [L] tags actually breaks the original RewriteRule and redirects item_123/ to an empty database.php with no parameters.

Is it possible to remove all query strings except for those already mod_rewritten earlier without explicitly writing down exceptions for all pairs of %{REQUEST_URI}s and %{QUERY_STRING}s?

Edit:

Solution A

# You do not need this whole block if you're running Apache v2.3.9+
RequestHeader set SOME-FANCY-NAME-FOR-THE-HEADER-AS-DESCRIBED-IN-THE-ABOVE-LINK 1 env=END

RewriteCond %{HTTP:SOME-FANCY-NAME-FOR-THE-HEADER-AS-DESCRIBED-IN-THE-ABOVE-LINK} =1 [NV]
RewriteRule .* - [L]

As the [END] flag works only on Apache v2.3.9+, I used a workaround which would emulate this behaviour.

# Replace [L,E=END:1] with [END] if running Apache v2.3.9+
RewriteCond %{THE_REQUEST} ^GET\ [^?]+$
RewriteRule ^item_([0-9]+)/$ database.php?type=product&id=$1 [L,E=END:1]

Simply restricting any ? in THE_REQUEST in the first place will make duplicate pages of item_123/?foo=bar pattern not found (404). The [L,E=END:1] flag tells mod_rewrite to stop current iteration and reiterate; the next iteration will trigger RewriteRule .* - [L] and block it from reaching the potentional loop we have afterwards. The [END] flag, if supported, would stop it straight away.

RewriteCond %{QUERY_STRING} type=product
RewriteCond %{QUERY_STRING} id=([0-9]+)
RewriteRule ^database\.php$ http://www.example.com/item_%1/? [R=301,L]

This will also redirect (301) the potentially compromised duplicate pages of database.php?type=product&foo=bar&id=123 pattern to the correct URL regardless of gibberish paramaters in the query. Once it reaches the correct URL, it will stop there without causing a loop and error 500.

# If page is accessible without parameters

RewriteCond %{THE_REQUEST} ^GET\ [^?]+$
RewriteRule ^catalog/$ database.php [L,E=END:1]

RewriteCond %{THE_REQUEST} ^GET\ [^?]+\?
RewriteRule ^database\.php$ http://www.example.com/catalog/? [R=301,L]

If the page is accessible without parameters like ?type and &type above but accessed as database.php?foo=bar or database.php?, it will be redirected (301) to catalog/ without the query string. Again, a page of catalog/?foo=bar pattern will not be found (404).

# If page is not accessible without parameters

RewriteCond %{THE_REQUEST} ^GET\ [^?]+\?
RewriteRule ^database(\.php|/)?$ database.php [L,E=END:1]

If the page is not accessible without parameters, we can force stop rewriting (to avoid unnecessary redirects later on if e.g. we have anyotherfile.php rewritten to anyotherfile/) and make the page send a 404 header itself once it knows that no valid parameters have been passed.

Solution A+B

The code from the accepted solution is correct by itself, while my version extends rewriting to match many other malformed patterns.

Adding the code from the accepted solution after all of the above code will capture the (previously) not found links of item_123/?foo=bar and catalog/?foo=bar pattern and redirect them (301) to the correct URLs item_123/ and catalog/ without the query strings. This makes sense, as the user will get to where he wants even if he follows a link modified by some RSS aggregators or such. Changing %{QUERY_STRING} (.+) to %{THE_REQUEST} ^GET\ [.?]+\? along with using %{THE_REQUEST} ^GET\ [^?]+$ instead of %{QUERY_STRING} ^$ in the above code will also remove trailing question marks - item_123/? - which would otherwise be overlooked and counted as duplicate pages if adressed.

RewriteCond %{THE_REQUEST} ^GET\ [^?]+\?
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

Upvotes: 1

Views: 665

Answers (3)

hakre
hakre

Reputation: 198203

The L flag does not stop. It re-injects if you changed the URL (which you did). Therefore then for every internal redirect (rewrite) you did, that very last condition is OK and then the very last rewrite triggered:

RewriteCond %{QUERY_STRING} (.+)
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

As this one does cut away the query string (ends with ?, no QSA flag) you end with the php script without parameters:

rewrite #1/1: item_5/ -> database.php?type=product&id=5
              L triggered, because URL changed, re-inject:
rewrite #1/2: database.php?type=product&id=5 -> http://www.example.com/database.php?
              R triggered, exiting

rewrite #2/1: http://www.example.com/database.php? -
              no rule matches, use as-is

Instead you need to place a condition at the end to not redirect on .php files:

RewriteCond %{QUERY_STRING} (.+)
RewriteCond %{REQUEST_URI} !^/[a-z]+\.php$    
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

or if you've got a more modern apache server version, just use the END flag:

RewriteRule ^item_([0-9]+)/$ database.php?type=product&id=$1 [END]
RewriteRule ^post_([0-9]+)/$ articles.php?id=$1 [END]
... and so on

Upvotes: 2

user1646111
user1646111

Reputation:

You can avoid this by using:

RewriteRule ^item_([0-9]+)/.*$ abc.php?type=product&id=$1 [L]

I added .* to match anything after slash but it still valid pattern for your redirect.

Upvotes: 0

hendr1x
hendr1x

Reputation: 1501

I don't know if this helps or not but how I handle things is to send files that don't exist to a specific php file (rewrite.php)

RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule ^.*$ ./rewrite.php

This lets me handle pretty every case I have come across easily

Upvotes: 0

Related Questions