Reputation: 30283
I installed some PHP software that added the following to my .htaccess:
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . %1/%2 [R=301,L]
What is this doing?
My interpretation, which is obviously wrong: It's capturing http://foo.bar.com
as http:
and foo.bar.com
, then replacing any character, .
, with http:/foo.bar.com
. Definitely not, right?
Upvotes: 1
Views: 515
Reputation: 785541
MrWhite has explained very nicely interpretation of the rule you have in question. However there is still a problem when a URL with multiple slashes is sent over to your web server e.g. /foo//bar////baz
. Your rule would cause 5 redirects before resolving it to /foo/bar/baz
.
There is some discussion in comments section below his answer on what should be the rule to get this done in a single redirect.
Here is a rule that will remove all multiple slash URLs into a single slash URL in a single redirect:
RewriteEngine On
RewriteCond %{REQUEST_URI} //
RewriteRule ^.*$ /$0 [R=301,L,NE]
It uses back-reference of matched pattern from RewriteRule
directive which already gets multiple slash free match by mod_rewrite
engine.
Upvotes: 3
Reputation: 45914
It's capturing
http://foo.bar.com
ashttp:
andfoo.bar.com
, then replacing any character,.
, withhttp:/foo.bar.com
. Definitely not, right?
Right, definitely not. :)
That code reduces multiple slashes that appear together in the URL-path to a single slash. So a URL like example.com/foo//bar////baz
becomes example.com/foo/bar/baz
.
The REQUEST_URI
server variable contains the URL-path only (starting with a slash). eg. /foo//bar////baz
(in the above example). %1
and %2
are backreferences to the captured groups in the last matched CondPattern (ie. the strings either side of a double slash).
The single dot (.
) in the RewriteRule
pattern matches a single character. So this rule matches every URL, except the document root, when the URL-path is empty.
Why check for multiple slashes? If these are requests for physical files then Apache will implicitly reduce multiple slashes in order to serve the resource. So /foo//bar////baz.html
will return the same as /foo/bar/baz.html
. So, that's "good". However, these are technically different URLs, so this could be perceived as duplicate content by search engines. It could also break your application, if you are parsing the URL for other purposes. This may or may not be a problem. This would require users to link to you incorrectly etc. (unless something broke in your web app that resulted in these URLs being generated.)
I would add that this method isn't particularly efficient as it requires multiple external redirects if you have many additional slashes (although you could argue that this is only intended to cache edge cases anyway). For example, given a request for /foo//bar////baz
, the following redirects will occur:
/foo//bar////baz
(Initial request / redirect)/foo//bar///baz
(redirect)/foo//bar//baz
(redirect)/foo//bar/baz
(redirect)/foo/bar/baz
Upvotes: 4