Reputation: 51
I have all query strings which I need already rewritten to SEO friendly URLs, like
RewriteRule ^item_([0-9]+)/$ database.php?type=product&id=$1 [L]
RewriteRule ^post_([0-9]+)/$ articles.php?id=$1 [L]
... and so on
but I would like to strip any other query strings like item_123/?foo=bar or database.php?foo=bar or post_123/?type=product&id=321 for both SEO and security reasons.
The apparently obvious solution of placing
RewriteCond %{QUERY_STRING} (.+)
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
in the end of .htaccess to deal with everything that has not bean dealt before and stopped by [L] tags actually breaks the original RewriteRule and redirects item_123/ to an empty database.php with no parameters.
Is it possible to remove all query strings except for those already mod_rewritten earlier without explicitly writing down exceptions for all pairs of %{REQUEST_URI}s and %{QUERY_STRING}s?
# You do not need this whole block if you're running Apache v2.3.9+
RequestHeader set SOME-FANCY-NAME-FOR-THE-HEADER-AS-DESCRIBED-IN-THE-ABOVE-LINK 1 env=END
RewriteCond %{HTTP:SOME-FANCY-NAME-FOR-THE-HEADER-AS-DESCRIBED-IN-THE-ABOVE-LINK} =1 [NV]
RewriteRule .* - [L]
As the [END]
flag works only on Apache v2.3.9+, I used a workaround which would emulate this behaviour.
# Replace [L,E=END:1] with [END] if running Apache v2.3.9+
RewriteCond %{THE_REQUEST} ^GET\ [^?]+$
RewriteRule ^item_([0-9]+)/$ database.php?type=product&id=$1 [L,E=END:1]
Simply restricting any ?
in THE_REQUEST in the first place will make duplicate pages of item_123/?foo=bar
pattern not found (404). The [L,E=END:1]
flag tells mod_rewrite to stop current iteration and reiterate; the next iteration will trigger RewriteRule .* - [L]
and block it from reaching the potentional loop we have afterwards. The [END]
flag, if supported, would stop it straight away.
RewriteCond %{QUERY_STRING} type=product
RewriteCond %{QUERY_STRING} id=([0-9]+)
RewriteRule ^database\.php$ http://www.example.com/item_%1/? [R=301,L]
This will also redirect (301) the potentially compromised duplicate pages of database.php?type=product&foo=bar&id=123
pattern to the correct URL regardless of gibberish paramaters in the query. Once it reaches the correct URL, it will stop there without causing a loop and error 500.
# If page is accessible without parameters
RewriteCond %{THE_REQUEST} ^GET\ [^?]+$
RewriteRule ^catalog/$ database.php [L,E=END:1]
RewriteCond %{THE_REQUEST} ^GET\ [^?]+\?
RewriteRule ^database\.php$ http://www.example.com/catalog/? [R=301,L]
If the page is accessible without parameters like ?type
and &type
above but accessed as database.php?foo=bar
or database.php?
, it will be redirected (301) to catalog/
without the query string. Again, a page of catalog/?foo=bar
pattern will not be found (404).
# If page is not accessible without parameters
RewriteCond %{THE_REQUEST} ^GET\ [^?]+\?
RewriteRule ^database(\.php|/)?$ database.php [L,E=END:1]
If the page is not accessible without parameters, we can force stop rewriting (to avoid unnecessary redirects later on if e.g. we have anyotherfile.php
rewritten to anyotherfile/
) and make the page send a 404 header itself once it knows that no valid parameters have been passed.
The code from the accepted solution is correct by itself, while my version extends rewriting to match many other malformed patterns.
Adding the code from the accepted solution after all of the above code will capture the (previously) not found links of item_123/?foo=bar
and catalog/?foo=bar
pattern and redirect them (301) to the correct URLs item_123/
and catalog/
without the query strings. This makes sense, as the user will get to where he wants even if he follows a link modified by some RSS aggregators or such. Changing %{QUERY_STRING} (.+)
to %{THE_REQUEST} ^GET\ [.?]+\?
along with using %{THE_REQUEST} ^GET\ [^?]+$
instead of %{QUERY_STRING} ^$
in the above code will also remove trailing question marks - item_123/?
- which would otherwise be overlooked and counted as duplicate pages if adressed.
RewriteCond %{THE_REQUEST} ^GET\ [^?]+\?
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
Upvotes: 1
Views: 665
Reputation: 198203
The L flag does not stop. It re-injects if you changed the URL (which you did). Therefore then for every internal redirect (rewrite) you did, that very last condition is OK and then the very last rewrite triggered:
RewriteCond %{QUERY_STRING} (.+)
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
As this one does cut away the query string (ends with ?
, no QSA
flag) you end with the php script without parameters:
rewrite #1/1: item_5/ -> database.php?type=product&id=5
L triggered, because URL changed, re-inject:
rewrite #1/2: database.php?type=product&id=5 -> http://www.example.com/database.php?
R triggered, exiting
rewrite #2/1: http://www.example.com/database.php? -
no rule matches, use as-is
Instead you need to place a condition at the end to not redirect on .php files:
RewriteCond %{QUERY_STRING} (.+)
RewriteCond %{REQUEST_URI} !^/[a-z]+\.php$
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
or if you've got a more modern apache server version, just use the END
flag:
RewriteRule ^item_([0-9]+)/$ database.php?type=product&id=$1 [END]
RewriteRule ^post_([0-9]+)/$ articles.php?id=$1 [END]
... and so on
Upvotes: 2
Reputation:
You can avoid this by using:
RewriteRule ^item_([0-9]+)/.*$ abc.php?type=product&id=$1 [L]
I added .*
to match anything after slash but it still valid pattern for your redirect.
Upvotes: 0
Reputation: 1501
I don't know if this helps or not but how I handle things is to send files that don't exist to a specific php file (rewrite.php)
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule ^.*$ ./rewrite.php
This lets me handle pretty every case I have come across easily
Upvotes: 0