Reputation: 1567
I am having a strange problem with Internet Explorer and greek encoded get variables. I have a simple script (utf-8 encoded) in php that does the following:
$term = $_GET['term'];
// term processing
echo htmlspecialchars($term);
echo '<form method="GET" action="/script.php">';
echo '<input type="hidden" name="ref" value="'.$term.'">';
echo '<input type='submit' value="do submit()">';
echo '</form>';
I call the script the usual way: http://localhost/form.php?term=διαστημοπλοιο. Now while in chrome, firefox (mac), safari (mac) the value gets ok (it will display %CE%B4%CE%B9%CE%B1%CF%83%CF%84%CE%B7%CE%BC%CE%BF%CF%80%CE%BB%CE%BF%CE%B9%CE%BF - percent utf-8 encoded), in Internet Explorer 8+ (windows) I get garbled text (empty squares). Is there something I should check? I used mb_detect_encoding and the string is UTF-8 encoded.
I've also used the following tests (php code below) and there are differences between browsers:
header("Content-type:text/html;charset=utf-8");
$term = $_GET['term'];
echo "<pre>";
echo "DECODED:".urldecode($term)."\n";
echo "TERM ENCODING (FROM BROWSER) [mb_detect_encoding]:".mb_detect_encoding($term)."\n";
echo "UTF-8:\n";
echo 'URLDECODE:'."\t".urlencode($term)."\n";
echo 'RAWURLDECODE:'."\t".rawurlencode($term)."\n";
echo 'URLDECODE:'."\t".urldecode(utf8_encode(urlencode($term)))."\n";
echo 'RAWURLDECODE:'."\t".rawurlencode($term)."\n";
echo "UTF-8->cp1253 iv:\n";
$zz = iconv('UTF-8','cp1253',$term);
echo 'URLDECODE:'."\t".urlencode(iconv('UTF-8','cp1253',$term))."\n";
echo "cp1253->UTF-8 iv:\n";
echo 'URLDECODE:'."\t".urlencode(iconv('cp1253','UTF-8',$zz))."\n";
echo "</pre>";
The results on mac/linux were:
DECODED:διαστημοπλοιο TERM ENCODING (FROM BROWSER) [mb_detect_encoding]:UTF-8 UTF-8: URLDECODE: %CE%B4%CE%B9%CE%B1%CF%83%CF%84%CE%B7%CE%BC%CE%BF%CF%80%CE%BB%CE%BF%CE%B9%CE%BF RAWURLDECODE: %CE%B4%CE%B9%CE%B1%CF%83%CF%84%CE%B7%CE%BC%CE%BF%CF%80%CE%BB%CE%BF%CE%B9%CE%BF URLDECODE: διαστημοπλοιο RAWURLDECODE: %CE%B4%CE%B9%CE%B1%CF%83%CF%84%CE%B7%CE%BC%CE%BF%CF%80%CE%BB%CE%BF%CE%B9%CE%BF UTF-8->cp1253 iv: URLDECODE: %E4%E9%E1%F3%F4%E7%EC%EF%F0%EB%EF%E9%EF cp1253->UTF-8 iv: URLDECODE: %CE%B4%CE%B9%CE%B1%CF%83%CF%84%CE%B7%CE%BC%CE%BF%CF%80%CE%BB%CE%BF%CE%B9%CE%BF
The results on IE were:
DECODED:������������� TERM ENCODING (FROM BROWSER) [mb_detect_encoding]:UTF-8 UTF-8: URLDECODE: %E4%E9%E1%F3%F4%E7%EC%EF%F0%EB%EF%E9%EF RAWURLDECODE: %E4%E9%E1%F3%F4%E7%EC%EF%F0%EB%EF%E9%EF URLDECODE: ������������� RAWURLDECODE: %E4%E9%E1%F3%F4%E7%EC%EF%F0%EB%EF%E9%EF UTF-8->cp1253 iv: Notice: iconv() [function.iconv]: Detected an illegal character in input string in /Users/xalapan/Sites/drupal/io.php on line 15 Notice: iconv() [function.iconv]: Detected an illegal character in input string in /Users/xalapan/Sites/drupal/io.php on line 16 URLDECODE: cp1253->UTF-8 iv: URLDECODE:
Upvotes: 1
Views: 2253
Reputation: 338148
You need htmlspecialchars()
, not urlencode()
. Not using htmlspecialchars leads to XSS vulnerabilities.
echo '<input type="hidden" name="ref" value="'.htmlspecialchars($term).'">';
Also, the values in $_GET
are already decoded. You must not decode them again.
Upvotes: 1