Govind Rai
Govind Rai

Reputation: 15800

Why does my htaccess code not work?

Could you please explain why my .htaccess code does not work? Whatever the right code is, I'm trying to better understand URL Rewriting and Redirecting and I would appreciate a more verbose/detailed explanation of all syntax and code. Most answers on SO simply state the answer with very little explanation.

# Hypertext Access Directives by Govind Rai

# First rewrite to HTTPS:
# Don't put www. here. If it is already there it will be included, if not
# the subsequent rule will catch it.
RewriteCond %{HTTPS} off
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

# Now, rewrite any request to the wrong domain to use www.
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule .* https://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

###############last two directives that don't work#######################

# hide .html extension govie v1
RewriteCond %{THE_REQUEST} \.html$
RewriteRule ^/[^.]+\.html$ /$1 [NC,R=301,L]

#internal redirect to the right .html file
RewriteCond %{THE_REQUEST} !\.html$
RewriteRule ^/([^.]+)$ /$1.html [L]

I want to understand why the last two rules are not working. When i access a url without the .html extension I get a 404 page not found error, and a url with extension does not rewrite itself without an extension. I've posted the entire file incase there are conflicting rules.

Upvotes: 2

Views: 203

Answers (3)

anubhava
anubhava

Reputation: 785196

Problem is this condition:

RewriteCond %{THE_REQUEST} \.html$

That condition will never succeed because example value of %{THE_REQUEST} is GET /index.php?id=123 HTTP/1.1. It represents the raw HTTP request as received by Apache.

You can use these rules to fix your issue:

RewriteEngine On

## add www and turn on https in same rule

# if HOST name doesn't start with www. - OR
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
# if HTTPS is off
RewriteCond %{HTTPS} off
# *capture* hostname part after www in %1
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
# redirect with https://www.%1/... to always apply https and www
RewriteRule ^ https://www.%1%{REQUEST_URI} [R=301,L,NE]

## hide .html extension
# if original request is ending with .html then capture part before .html in %1
RewriteCond %{THE_REQUEST} \s/+(.+?)\.html[\s?] [NC]
# and redirect to %1 (part without .html)
RewriteRule ^ /%1 [R=302,NE,L]

# internally add .html if there a matching .html file in your web root
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^(.+?)/?$ $1.html [L]

References:

Upvotes: 2

Dusan Bajic
Dusan Bajic

Reputation: 10879

${THE_REQUEST} contains The full HTTP request line sent by the browser to the server (e.g., GET /index.html HTTP/1.1) so it will never match \.html$ (since it never ends with .html). Perhaps you can try:

RewriteCond %{THE_REQUEST} \.html\sHTTP
RewriteRule ^([^.]+)\.html$ /$1 [NC,R=301,L]

RewriteCond %{REQUEST_URI} !\.html$
RewriteRule ^ %{REQUEST_URI}.html [L]

Upvotes: 0

arkascha
arkascha

Reputation: 42925

The issue most likely is a pretty simple one: when using rewrite rules inside .htaccess style files the request path is relative, so does not insist on a leading slash. That means you have to modify your rules patterns slightly:

#enable rewriting
Options -Multiviews
RewriteEngine on
RewriteMap /

#internal redirect to the right .html file
RewriteCond %{THE_REQUEST} !\.html$
RewriteCond %{THE_REQUEST} !-f
RewriteCond %{THE_REQUEST} !-d
RewriteRule ^/?([^.]+)$ /$1.html [END]

# hide .html extension govie v1
RewriteCond %{THE_REQUEST} \.html$
RewriteCond %{THE_REQUEST} -f
RewriteRule ^/?([^.]+)\.html$ /$1 [NC,R=301,END]

Instead of completely removing that leading slash I personally like the idea of adding a question mark, so making them optional. This allows to use the same rules inside the http servers host configuration without modification.

I also added the well known twin rules to check if the request does not address a physically existing file or folder. This typically is desired, but you obviously have to decide yourself.


A general hint: you should always prefer to place such rules inside the http servers real host configuration. These .htaccess style files are notoriously error prone, they are hard to debug and really slow down the server, often without reason. They are only provided for situations where you do not have access to that configuration (read: really cheap hosting providers) or if your application needs to write its own rewriting rules (an obvious security nightmare).

Upvotes: 0

Related Questions