Reputation: 11
I have a form which accepts text input. I would like it to be able to accept characters such as & and ; and > and <, which are useful characters for the data being supplied by the user. I want the user to, for example, be able to say
The ampersand (&) is encoded as & (and I see from the preview that I can't even do that here - it should look like The ampersand (&) is encoded as & but I had to type in amp;amp; after the ampersand to get that to look right.) (btw, the preview is cool, but I can't count on users having scripts enabled)
I parse the data, and if there is a problem with it, I present the user's entry back to the user, in the same form, prefilled in the same field, for editing and resubmission.
If I present the raw data, I run the risk of having hostile input (such as scripts or HTML) executed by the browser. However, if I filter it (such as via htmlspecialcharacters), then the user would see (a representation of) the character he had typed (say, the ampersand), but when he re-submits, he will =actually= be submitting the replacement (in this case what looks like &), which as it turns out even contains an ampersand. If there is still a problem with the input, it will be presented again for editing, and we'll be another level deep in replacements.
User data is accepted only when what the user actually submits is identical to the sanitized version of the data. It is destined for a text file on the server, and an Email sent to the organization behind the website.
I suppose the "question that can be answered" is "is this even possible?"
Jose
edit:
<?php
$var=$_GET["test2"];
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
<title>Input Escape Test</title>
</head><body>
The php parser would store the following input:<br>
<?php echo $var ?>
<br>
<form method="get" action="test.php"><p>
<label for "test2">Test - question five: <br>type in a character on the first line<br>and its HTML entity on the second line.
<textarea name="test2" cols="50" rows="3"><?php echo $var; ?></textarea><br/>
<input type="submit"/>
</p></form>
</body></html>
results in a form where the user attempts to answer the question with ampersand ampersand a m p semicolon. IF that gets rejected (say, because of other illegal characters), the user is presented with his input back, minus the stripped characters. However, the a m p semicolon is also stripped from view (though it's in the source). The user will then attempt to add another a m p semicolon to the displayed result.
The only way the user gets to see ampersand a m p semicolon displayed (upon rejected input), is to type in ampersand a m p semicolon a m p semicolon
Finally satisfied, the user clicks submit again, and the a m p semicolon seemingly disappears again. The user doesn't know what his (submitted) answer will be stored as.
I want the user to be able to type in: ampersand a m p semicolon and, upon rejection, see ampersand a m p semicolon and upon acceptance, store ampersand a m p semicolon
Jose
Upvotes: 1
Views: 1451
Reputation: 14077
Yes this is possible in Javascript as well as in server side code. As you said you won't count users having javascript enabled, I assume you want to do this kind of conversion on the server side? You just let the user send the form data via a POST request to your server side code and there you tranform every occurance of <, >, &, " and ' into their respective entity form when you write the data back to the html response page. This will then show up in the browser exactly as it was entered by the user.
Edit: Sorry, I didn't read your question carefully enough. You should be able to use just one level of escaping, i.e. to write & for a '&' and not &amp;. This one level will be stripped when the browser parses your page and will be disappeared from the data when it get's sent back as form data. Have a look at the generated html code and try to find out what makes you need that second level of escapes.
Edit2 in response to the comments: Here is a simple test page that works as expected in IE 8.0 and Firefox. When you press the send button you will see what is getting sent to the server in the address bar of your browser (the %26 is just the URL-encoding for the &). As you can see the & gets stripped from the value and also from the data that is sent to server.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-type" content="text/html;charset=ISO-8859-1" />
<title>Input Escape Test</title>
</head><body>
<form method="get" action=""><p>
<input name="test1" type="text" size="30" value="hello & test"/><br/>
<textarea name="test2" cols="50" rows="3">hello & test</textarea><br/>
<input type="submit"/>
</p></form>
</body></html>
Upvotes: 1
Reputation: 48357
When pushing data out of PHP, to the browser, to a database, anywhere, you MUST change it representation to one acceptable to the receiving end.
In the case of sending stuff to the browser, you need the htmlentities converter:
print "<input type='text' name='inp' value='" . htmlentities($_POST['inp']) . "'>\n";
C.
Upvotes: 0