Andrew Cheong
Andrew Cheong

Reputation: 30283

How does a RewriteRule work when the first argument is just a dot?

I installed some PHP software that added the following to my .htaccess:

RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . %1/%2 [R=301,L]

What is this doing?

My interpretation, which is obviously wrong: It's capturing http://foo.bar.com as http: and foo.bar.com, then replacing any character, ., with http:/foo.bar.com. Definitely not, right?

Upvotes: 1

Views: 515

Answers (2)

anubhava
anubhava

Reputation: 785541

MrWhite has explained very nicely interpretation of the rule you have in question. However there is still a problem when a URL with multiple slashes is sent over to your web server e.g. /foo//bar////baz. Your rule would cause 5 redirects before resolving it to /foo/bar/baz.

There is some discussion in comments section below his answer on what should be the rule to get this done in a single redirect.

Here is a rule that will remove all multiple slash URLs into a single slash URL in a single redirect:

RewriteEngine On

RewriteCond %{REQUEST_URI} //
RewriteRule ^.*$ /$0 [R=301,L,NE]

It uses back-reference of matched pattern from RewriteRule directive which already gets multiple slash free match by mod_rewrite engine.

Upvotes: 3

MrWhite
MrWhite

Reputation: 45914

It's capturing http://foo.bar.com as http: and foo.bar.com, then replacing any character, ., with http:/foo.bar.com. Definitely not, right?

Right, definitely not. :)

That code reduces multiple slashes that appear together in the URL-path to a single slash. So a URL like example.com/foo//bar////baz becomes example.com/foo/bar/baz.

The REQUEST_URI server variable contains the URL-path only (starting with a slash). eg. /foo//bar////baz (in the above example). %1 and %2 are backreferences to the captured groups in the last matched CondPattern (ie. the strings either side of a double slash).

The single dot (.) in the RewriteRule pattern matches a single character. So this rule matches every URL, except the document root, when the URL-path is empty.

Why check for multiple slashes? If these are requests for physical files then Apache will implicitly reduce multiple slashes in order to serve the resource. So /foo//bar////baz.html will return the same as /foo/bar/baz.html. So, that's "good". However, these are technically different URLs, so this could be perceived as duplicate content by search engines. It could also break your application, if you are parsing the URL for other purposes. This may or may not be a problem. This would require users to link to you incorrectly etc. (unless something broke in your web app that resulted in these URLs being generated.)


I would add that this method isn't particularly efficient as it requires multiple external redirects if you have many additional slashes (although you could argue that this is only intended to cache edge cases anyway). For example, given a request for /foo//bar////baz, the following redirects will occur:

  1. /foo//bar////baz (Initial request / redirect)
  2. /foo//bar///baz (redirect)
  3. /foo//bar//baz (redirect)
  4. /foo//bar/baz (redirect)
  5. /foo/bar/baz

Upvotes: 4

Related Questions