Reputation: 11
Google has indexed my old URLs that no longer exist. For example, when someone clicks on the old link from Google search, there will be shown a 500 server error.
My old URL was https://example.org/idn-poker/
, which has recently changed to https://example.org/idn-poker
.
The problem is the trailing /
.
How can I solve this issue?
My .htaccess
:
<IfModule mime_module>
AddHandler application/x-httpd-ea-php72 .php .php7 .phtml
ErrorDocument 404 https://example.org
RewriteEngine on
RewriteCond %{THE_REQUEST} \.html [NC]
RewriteRule ^(.*)\.html$ /$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html [L]
</IfModule>
Upvotes: 3
Views: 1134
Reputation: 45829
If your URLs have changed (from slash suffix to no slash suffix) then you should 301 redirect from the old to the new URL, otherwise Google will continue to index the old URLs, whilst you are internally linking to the new URLs - thus creating a duplicate content issue. So, this is a necessary step to preserve SEO.
For example:
# Redirect to remove trailing slash on URLs (excluding directories)
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.+)/$ /$1 [R=301,L]
However, there are other issues with your directives...
ErrorDocument 404 https://example.org
This causes any URL that would otherwise trigger a 404, being 302 redirected to your homepage. This generally delivers a bad user experience. It's likely to be seen as a soft-404 by Google, so there is no SEO advantage in doing this. It also pollutes your server logs with a mass of 302s and doubles the requests to your server.
You should configure a custom 404 page with a meaningful message to your users, for example:
ErrorDocument 404 /error-documents/my-custom-404.html
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html [L]
These directives will result in a rewrite-loop (500 error) when requesting a URL with a trailing slash since REQUEST_FILENAME
does not hold the same value as the $1
backreference. So whilst the condition might be true, the rewrite that follows is invalid. See my answer to the following question ServerFault with more information on this: https://serverfault.com/questions/989333/using-apache-rewrite-rules-in-htaccess-to-remove-html-causing-a-500-error
The immediate problem can be resolved by redirecting the URL with a trailing slash (as mentioned above), however, a rewrite-loop could still occur for other (possibly invalid) URLs.
There is also no need to check that the request does not map to a directory or file but does exist as a file when appending a .html
extension. You are doing 3 times the work when only a single condition is necessary:
RewriteCond %{DOCUMENT_ROOT}/$0\.html -f
RewriteRule .+[^/]$ $0.html [L]
The regex .+[^/]$
simply avoids matching directories that end in a trailing slash, so avoiding the additional filesystem check in the condition.
The $0
backreference contains the full match (which we know does not end in a slash).
This rewrite should be used with the external redirect above that removes the trailing slash from non-directories.
<IfModule mime_module>
Encasing all your directives in this <IfModule>
wrapper is questionable. If the mime_module is not available then your site just fails silently. It's very unlikely that mod_mime is not available - is it not better to be notified of an error if this fails?
AddHandler application/x-httpd-ea-php72 .php .php7 .phtml
ErrorDocument 404 /error-documents/my-custom-404.html
RewriteEngine On
# Redirect to remove trailing slash on URLs (excluding directories)
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.+)/$ /$1 [R=301,L]
# Redirect to remove ".html" extension if requested directly
RewriteCond %{THE_REQUEST} \.html
RewriteRule (.+)\.html$ /$1 [R=301,L]
# Append .html extension
RewriteCond %{DOCUMENT_ROOT}/$0\.html -f
RewriteRule .+[^/]$ $0.html [L]
Upvotes: 2
Reputation: 41219
You need to match trailing slash as an optional char in your regex in order to avoid rewriting to /file/.html
which is a broken path and may result in server error. .
Change your rule to
RewriteRule ^(.*?)/?$ $1.html [L]
Upvotes: 1