Reputation: 17314
This is a follow up to my last question here. The answer posted there actually does not work. So here is the challenge. You are given this code (assume jQuery included):
<input type=text>
<script>
$("input").val(**YOUR PHP / JS CODE HERE**);
</script>
Using jQuery - and not by injecting PHP output directly into the input tag - faithfully reproduce ANY text from the database in the input tag. If the database field says </script>
, the field should say that too. If has Chinese in it, double quotes, whatever, reproduce that too. Assume your PHP variable is called $text
.
Here are some of my failed attempts.
1)
$("input").val("<?= htmlentities($text); ?>");
FAILURE: Reproduces character encoding exactly as is in text fields.
INPUT: $text = "Déjà vu"
OUTPUT: Field contains literal déjà vu
2)
$("input").val(<?= json_encode($text); ?>);
This was suggested as the answer in my last question, and I naively accepted it. However...
FAILURE: json_encode
only works with UTF-8 characters.
INPUT: $text = "Va e de här fö frågor egentlien"
OUTPUT: Field is blank, because json_encode
returns null
.
3)
var temp = $("<div></div>").html("<?= htmlentities($text); ?>");
$("input").val(temp.html());
This was my most promising solution for the weird characters, except...
FAILURE: Does not encode some characters (not sure exactly which, don't care)
INPUT: $text = "</script> Déjà"
OUTPUT: Field contains </script> Déjà
4) Suggested in answers
$("input").val(unescape("<?= urlencode($text); ?>"));
FAILURE: Spaces remain encoded as +'s.
$("input").val(unescape(<?= rawurlencode($text); ?>"));
Almost works. All previous input succeeds, but multibyte stuff, like kanji, remain encoded. decodeURIComponent
also doesn't like multibyte characters.
Note that for me, things like strip_tags
are not an option. Everything must be allowed. People are authoring quizzes with this, and if someone wants to make a quiz that tests your knowledge of HTML, so be it. Also, unfortunately I cannot just inject the htmlentities
escaped text into the value field of the input tags. These tags are generated dynamically, and I would have to totally tear down my current javascript code structure to do it that way.
I feel like I'm SOL here. Please show me how wrong I am.
Assume the user initally entered </script> Déjà här fö frågor 漢字
into the db. This would be stored (you would see it in phpMyAdmin) as </script> Déjà här fö frågor 漢字
Upvotes: 2
Views: 318
Reputation: 6675
safe javascript escaping for ascii strings.
<?php
function js_encode($string)
{
$cleaned = is_null($string) ? null : '';
// for each letter of the string
for ($i=0, $len = strlen($string); $i < $len; $i++)
{
// get ascii number
$ord = ord($string[$i]);
// if [0-9] or [A-Z] or [a-z]
$cleaned .= (47 < $ord && $ord < 58 OR 64 < $ord && $ord < 91 OR 96 < $ord && $ord < 123)
// use existing character
? $string[$i]
// otherwise escape it
: '\x'.dechex($ord);
}
return $cleaned;
}
for unicode text it is a little more complicated, I am going to start with this and see if I need to do the more complex version.
Upvotes: 0
Reputation: 17314
I have found a "good enough" solution that you all might find interesting.
utf8_encode
the string on the way into the database. This makes sure that it can be safely handled on the way out by the following steps.2.
function repl($match)
{
return "\u" . dechex($match[1]);
}
function esc($string)
{
$s = json_encode($string);
$s = preg_replace_callback("/&#([0-9]+);/", "repl", $s);
return $s;
}
This isn't absolutely perfect, because there doesn't seem to be any way for the php to know the difference between the user typing 漢 or literally typing 漢
. So if you type the latter it will become the former. But I doubt anyone will ever want to do that anyway.
Upvotes: 1
Reputation: 25271
What encoding is your text in, if not UTF-8? If you don't know, you don't have text, you have a byte sequence, which is much harder to faithfully represent. If you do know, you can do something like this using the PHP multibyte string extension:
$("input").val(<?= json_encode(mb_convert_encoding($text, "UTF-8", "ISO-8859-1")); ?>);
Here I've presumed your input is in ISO-8859-1 aka Latin-1 encoding, which is a pretty common case for database strings.
EDIT: This is in response to the comments about a closing script tag. I made this test file and it displays properly for me, at least in Firefox 3.6:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Test</title>
<script src='http://code.jquery.com/jquery-1.4.2.js'></script>
</head>
<form name='foo'>
<input name='bar' id='bar'/>
</form>
<script language="JavaScript">
$('input').val("<\/script>");
</script>
</html>
Upvotes: 1
Reputation: 50700
You need to encode in PHP, and decode in JavaScript...
PHP's rawurlencode():
echo rawurlencode("</script> Déjà");
//result: %3C%2Fscript%3E+D%C3%A9j%C3%A0
JavaScript's decodeURIComponent():
var encoded = "%3C%2Fscript%3E+D%C3%A9j%C3%A0";
alert(decodeURIComponent(encoded));
//result: </script> Déjà
Upvotes: 1
Reputation: 97835
You can use:
base64_encode
rawurlencode
(probably the easiest option)htmlspecialchars
with ENT_QUOTES
or perhaps a combination of htmlspecialchars
with ENT_NOQUOTES
and addslashes
if you don't want your quotes to turn into HTML entities.Upvotes: 0