Reputation: 31
I have done a lot of research about removing subfolders however cannot find away to create an .htaccess rule to remove all subfolders in my root directory, example below:
www.domain.com/dan/dan changes to www.domain.com/dan
www.domain.com/pam/pam changes to www.domain.com/pam
www.domain.com/jam/jam changes to www.domain.com/jam
The .htaccess rule should keep this pattern up through infinity without me having to add the names of the subfolders to my rule, kind of like a wildcard condition or catchall scenario.
However, there is one condition, only remove subfolder if the file has the same name as I have illustrated above in my example.
I’m on Apache 1.3.42 so will need a solution that is not for the newer versions please.
Checkout my .htaccess file below, I’ve done a lot of SEO work to it as you can see:
AddType application/x-httpd-php .html
RewriteEngine On
RewriteBase /
#non www to www
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
#removing trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ $1 [R=301,L]
#html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^\.]+)$ $1.html [NC,L]
#index redirect
#directory remove index.html
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/
RewriteRule ^index\.html$ http://www.arkiq.com/ [R=301,L]
#directory remove index
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\ HTTP/
RewriteRule ^index http://www.arkiq.com/ [R=301,L]
#sub-directory remove index.html
RewriteCond %{THE_REQUEST} /index\.html
RewriteRule ^(.*)/index\.html$ /$1 [R=301,L]
#sub-directory remove index
RewriteCond %{THE_REQUEST} /index
RewriteRule ^(.*)/index /$1 [R=301,L]
#remove .html
RewriteCond %{THE_REQUEST} \.html
RewriteRule ^(.*)\.html$ /$1 [R=301,L]
Let me know if you know how to forward all subfolders to their respectively named files with one rule as that would be superb.
Upvotes: 2
Views: 1132
Reputation: 20745
I have no setup here to test this rule with a real installation of apache, but I am pretty sure you can achieve this by using a positive lookahead with a capture group.
RewriteRule ^(.*?)([^/]+)/(?=\2(/|$))([^/]+)/?$ /$1$4 [R,L]
What does this do? ^(.*?)
will match everything before the last two slashes. If you would go to example.com/test/test
, it would match exactly nothing. ([^/]+)
will match the first thing we want to test and puts it in capture group 2. (?=\2(/|$))
is the positive lookahead. A lookahead will 'peek' at the next characters, but will not consume any. \2
is replaced with the second capture group and (/|$)
will either match a slash or the end of the string. The last ([^/]+)
will match the second 'thing' and /?
will make sure that the url is matched even if a /
exists at the end of the url. After applying this rule this should happen:
example.com/test/test --> example.com/test
example.com/test/test2 --> no rewrite, because '2' does not match '/' or the end of the string
example.com/test/test/ --> example.com/test
example.com/sub/test/test --> example.com/sub/test
Debugging this rule
If you get an internal server error, please go to your apache error log and read what error it gives. Here is proof it works on a clean .htaccess on Apache 2.4.4 and, while it takes 1 minute to check an error log, it takes me several hours to read all patch notes for all Apache versions of the last 3 years.
External redirect, internal rewrite, preventing infinite loop
Assuming that above rule works on your version of mod_rewrite/apache/regex, the following construction will work to externally redirect your request, then internally rewrite it back. Please note that /test/test
will not do anything sensible, unless you tell apache how to execute such a file. Proof of concept.
#The external redirect
RewriteCond %{THE_REQUEST} ^(GET|POST)\ /(.*?)([^/]+)/(?=\3(/|\ ))
RewriteRule ^(.*?)([^/]+)/(?=\2(/|$))([^/]+)/?$ /$1$4 [R,L]
#The internal rewrite
RewriteCond %{REQUEST_URI} !^/(.*?)([^/]+)/(?=\2(/|$))([^/]+)/?$
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*/|)([^/]+)/?$ /$1$2/$2 [L]
You mention DirectorySlash Off
. Please note that on current versions of Apache this would only get applied to an actual external request. While doing internal rewrites you are safe. In both examples above, in Apache 2.4.4, even though I redirect to an url without a trailing slash, Apache will still append a slash in a second redirect. I am clueless how this was handled in 1.3.
If Apache 1.3 doesn't support backreferences or lookaround in it's regex engine, which I still can't test, there is no real way of testing if an url contains two segments that are the same via mod_rewrite. You'll either need to use a custom router page or write out every url out there (which can cause performance issues, as that is likely a lot). Rewriting to a router page goes like this:
RewriteRule ^(.*)$ /myrouter.php?url=$1 [L]
This router page in a language of your choice can send the 301 or 302 header too with a custom location. It will need to handle all other requests too that are matched by the rewriterule above.
Upvotes: 2