Chillu
Chillu

Reputation: 113

how to remove %u from url using htaccess

I'm getting bad request(error message below) from browser for an url

Bad Request

Your browser sent a request that this server could not understand. Client sent malformed Host header

Finally I realized that there is a special character "%u" on it. How can I remove the special character using .htaccess?

For e.g I want to change the URL from

http://www.example.com/property-listings/A/B/C/D/E/F-%uG/H/I-101.html

TO

http://www.example.com/property-listings/A/B/C/D/E/F-G/H/I-101.html

Any thoughts.

regards,

Upvotes: 0

Views: 986

Answers (2)

carlu
carlu

Reputation: 11

The short answer is that you can't — at least, not using .htaccess.

This is because the %u is parsed (or rather, not parsed) by Apache before even getting to the .htaccess file. Unfortunately the request itself is syntactically wrong, and Apache cannot parse it, hence the 400 Bad Request.

The %uHHHH was a non-standard (IIS) way of encoding Unicode characters. %uHHHH represented the Unicode character U+HHHH, where HHHH is the hexadecimal representation. For example, %u20AC represented the character .

Apache doesn't recognise the %uHHHH syntax (or any other stray % signs) and there's nothing you can do about it.

However, there is a workaround — you can use the ErrorDocument directive to handle the 400 Bad Request error using a PHP script (or whatever scripting language you're using).

E.g.

In your httpd.conf add the following line:

ErrorDocument 400 /400.php

This has to be added to the main Apache configuration (http.conf). You can't add this to your .htaccess for security reasons, even though you can add the directive for other HTTP response codes (e.g. 404 and 500). Apache considers an ErrorDocument directive for a 400 response codes is considered a security risk:

Although most error messages can be overridden, there are certain circumstances where the internal messages are used regardless of the setting of ErrorDocument. In particular, if a malformed request is detected, normal request processing will be immediately halted and the internal error message returned. This is necessary to guard against security problems caused by bad requests.

(From the Apache documentation.)

Then create the file 400.php in your web root:

<?php

$uri = isset($_SERVER['REQUEST_URI']) ? $_SERVER['REQUEST_URI'] : null;
if (preg_match('!%u[0-9a-f]{4}!i', $uri)) {
    // Convert all %uHHHH encodings to UTF-8 characters
    $redirectUri = preg_replace_callback('!%u(([0-9a-f]){4})!i', function($matches) { return json_decode('"\u' . $matches[1] . '"'); }, $uri);
    header('HTTP/1.1 301 Moved Permanently');
    header("Location: $redirectUri");
    die;
}

// Apache returned 400 Bad Request for some other reason, so just display the
// default error page

// Return a 404 Not Found response if anyone accesses the URL /400.php directly
$errorCode = preg_match('!^/400\.php!', $uri) ? 404 : 400;

?>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title><?php echo $errorCode == 404 ? '404 Not Found' : '400 Bad Request'; ?></title>
</head><body>
<?php if ($errorCode == 404) { ?>
<h1>Not Found</h1>
<p>The requested URL <?php echo htmlspecialchars(preg_replace('!([^?#]+).*!', '$1', $uri)); ?> was not found on this server.</p>
<?php } else { ?>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.</p>
<?php } ?>
</body></html>

This will redirect any request containing a %uHHHH-encoded character to the same URI, but using UTF-8.

I know this doesn't exactly answer your question (because your own URI contains the string %u, without any hexadecimal code), but you can easily adapt the script for your own purposes, and the script as I've written it will be more useful generally to other people.

Upvotes: 1

Jon Lin
Jon Lin

Reputation: 143886

Try:

RewriteRule ^(.*)%u(.*)$ /$1$2 [L,R=301]

Upvotes: 0

Related Questions