Reputation: 113
I'm getting bad request(error message below) from browser for an url
Bad Request
Your browser sent a request that this server could not understand. Client sent malformed Host header
Finally I realized that there is a special character "%u" on it. How can I remove the special character using .htaccess?
For e.g I want to change the URL from
http://www.example.com/property-listings/A/B/C/D/E/F-%uG/H/I-101.html
TO
http://www.example.com/property-listings/A/B/C/D/E/F-G/H/I-101.html
Any thoughts.
regards,
Upvotes: 0
Views: 986
Reputation: 11
The short answer is that you can't — at least, not using .htaccess.
This is because the %u
is parsed (or rather, not parsed) by Apache before even getting to the .htaccess file. Unfortunately the request itself is syntactically wrong, and Apache cannot parse it, hence the 400 Bad Request.
The %uHHHH
was a non-standard (IIS) way of encoding Unicode characters. %uHHHH
represented the Unicode character U+HHHH, where HHHH is the hexadecimal representation. For example, %u20AC
represented the character €.
Apache doesn't recognise the %uHHHH
syntax (or any other stray % signs) and there's nothing you can do about it.
However, there is a workaround — you can use the ErrorDocument
directive to handle the 400 Bad Request error using a PHP script (or whatever scripting language you're using).
E.g.
In your httpd.conf add the following line:
ErrorDocument 400 /400.php
This has to be added to the main Apache configuration (http.conf). You can't add this to your .htaccess for security reasons, even though you can add the directive for other HTTP response codes (e.g. 404 and 500). Apache considers an ErrorDocument
directive for a 400 response codes is considered a security risk:
Although most error messages can be overridden, there are certain circumstances where the internal messages are used regardless of the setting of ErrorDocument. In particular, if a malformed request is detected, normal request processing will be immediately halted and the internal error message returned. This is necessary to guard against security problems caused by bad requests.
(From the Apache documentation.)
Then create the file 400.php in your web root:
<?php
$uri = isset($_SERVER['REQUEST_URI']) ? $_SERVER['REQUEST_URI'] : null;
if (preg_match('!%u[0-9a-f]{4}!i', $uri)) {
// Convert all %uHHHH encodings to UTF-8 characters
$redirectUri = preg_replace_callback('!%u(([0-9a-f]){4})!i', function($matches) { return json_decode('"\u' . $matches[1] . '"'); }, $uri);
header('HTTP/1.1 301 Moved Permanently');
header("Location: $redirectUri");
die;
}
// Apache returned 400 Bad Request for some other reason, so just display the
// default error page
// Return a 404 Not Found response if anyone accesses the URL /400.php directly
$errorCode = preg_match('!^/400\.php!', $uri) ? 404 : 400;
?>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title><?php echo $errorCode == 404 ? '404 Not Found' : '400 Bad Request'; ?></title>
</head><body>
<?php if ($errorCode == 404) { ?>
<h1>Not Found</h1>
<p>The requested URL <?php echo htmlspecialchars(preg_replace('!([^?#]+).*!', '$1', $uri)); ?> was not found on this server.</p>
<?php } else { ?>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.</p>
<?php } ?>
</body></html>
This will redirect any request containing a %uHHHH
-encoded character to the same URI, but using UTF-8.
I know this doesn't exactly answer your question (because your own URI contains the string %u
, without any hexadecimal code), but you can easily adapt the script for your own purposes, and the script as I've written it will be more useful generally to other people.
Upvotes: 1