Panagiotis
Panagiotis

Reputation: 1567

urlencode internet explorer problem

I am having a strange problem with Internet Explorer and greek encoded get variables. I have a simple script (utf-8 encoded) in php that does the following:

$term = $_GET['term'];
// term processing
echo htmlspecialchars($term);
echo '<form method="GET" action="/script.php">';
echo '<input type="hidden" name="ref" value="'.$term.'">';
echo '<input type='submit' value="do submit()">';
echo '</form>';

I call the script the usual way: http://localhost/form.php?term=διαστημοπλοιο. Now while in chrome, firefox (mac), safari (mac) the value gets ok (it will display %CE%B4%CE%B9%CE%B1%CF%83%CF%84%CE%B7%CE%BC%CE%BF%CF%80%CE%BB%CE%BF%CE%B9%CE%BF - percent utf-8 encoded), in Internet Explorer 8+ (windows) I get garbled text (empty squares). Is there something I should check? I used mb_detect_encoding and the string is UTF-8 encoded.

I've also used the following tests (php code below) and there are differences between browsers:

header("Content-type:text/html;charset=utf-8");
$term = $_GET['term'];
echo "<pre>";
echo "DECODED:".urldecode($term)."\n";
echo "TERM ENCODING (FROM BROWSER) [mb_detect_encoding]:".mb_detect_encoding($term)."\n";
echo "UTF-8:\n";
echo 'URLDECODE:'."\t".urlencode($term)."\n";
echo 'RAWURLDECODE:'."\t".rawurlencode($term)."\n";
echo 'URLDECODE:'."\t".urldecode(utf8_encode(urlencode($term)))."\n";
echo 'RAWURLDECODE:'."\t".rawurlencode($term)."\n";
echo "UTF-8->cp1253 iv:\n";
$zz = iconv('UTF-8','cp1253',$term);
echo 'URLDECODE:'."\t".urlencode(iconv('UTF-8','cp1253',$term))."\n";
echo "cp1253->UTF-8 iv:\n";
echo 'URLDECODE:'."\t".urlencode(iconv('cp1253','UTF-8',$zz))."\n";
echo "</pre>";

The results on mac/linux were:

DECODED:διαστημοπλοιο
TERM ENCODING (FROM BROWSER) [mb_detect_encoding]:UTF-8
UTF-8:
URLDECODE:  %CE%B4%CE%B9%CE%B1%CF%83%CF%84%CE%B7%CE%BC%CE%BF%CF%80%CE%BB%CE%BF%CE%B9%CE%BF
RAWURLDECODE:   %CE%B4%CE%B9%CE%B1%CF%83%CF%84%CE%B7%CE%BC%CE%BF%CF%80%CE%BB%CE%BF%CE%B9%CE%BF
URLDECODE:  διαστημοπλοιο
RAWURLDECODE:   %CE%B4%CE%B9%CE%B1%CF%83%CF%84%CE%B7%CE%BC%CE%BF%CF%80%CE%BB%CE%BF%CE%B9%CE%BF
UTF-8->cp1253 iv:
URLDECODE:  %E4%E9%E1%F3%F4%E7%EC%EF%F0%EB%EF%E9%EF
cp1253->UTF-8 iv:
URLDECODE:  %CE%B4%CE%B9%CE%B1%CF%83%CF%84%CE%B7%CE%BC%CE%BF%CF%80%CE%BB%CE%BF%CE%B9%CE%BF

The results on IE were:

DECODED:�������������
TERM ENCODING (FROM BROWSER) [mb_detect_encoding]:UTF-8
UTF-8:
URLDECODE:  %E4%E9%E1%F3%F4%E7%EC%EF%F0%EB%EF%E9%EF
RAWURLDECODE:   %E4%E9%E1%F3%F4%E7%EC%EF%F0%EB%EF%E9%EF
URLDECODE:  �������������
RAWURLDECODE:   %E4%E9%E1%F3%F4%E7%EC%EF%F0%EB%EF%E9%EF
UTF-8->cp1253 iv:

Notice:  iconv() [function.iconv]: Detected an illegal character in input string in /Users/xalapan/Sites/drupal/io.php on line 15

Notice:  iconv() [function.iconv]: Detected an illegal character in input string in /Users/xalapan/Sites/drupal/io.php on line 16
URLDECODE:  
cp1253->UTF-8 iv:
URLDECODE:

Upvotes: 1

Views: 2253

Answers (1)

Tomalak
Tomalak

Reputation: 338148

You need htmlspecialchars(), not urlencode(). Not using htmlspecialchars leads to XSS vulnerabilities.

echo '<input type="hidden" name="ref" value="'.htmlspecialchars($term).'">';

Also, the values in $_GET are already decoded. You must not decode them again.

Upvotes: 1

Related Questions