Reputation: 216
I'm still learning PHP and SQL. I'm trying to create a simple content management system for a website's list of events. All of the input form fields are either Text areas or Text boxes (yes, I want them that way), and I want to leave the user the ability to add HTML links in addition to text in these fields. The following functions seem a good place to start with sanitizing the input I get from the user, but since I'm new to this I wanted to get the opinions of more knowledgeable developers. What more should I be doing to try to protect the database?
P.S. Thanks to CSS-Tricks for these functions.
function cleanInput($input) {
$search = array(
'@<script[^>]*?>.*?</script>@si', // Strip out javascript
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments
);
$output = preg_replace($search, '', $input);
return $output;
}
function sanitize($input) {
if (is_array($input)) {
foreach($input as $var=>$val) {
$output[$var] = sanitize($val);
}
}
else {
if (get_magic_quotes_gpc()) {
$input = stripslashes($input);
}
$input = cleanInput($input);
$output = htmlentities($output);
$output = mysql_real_escape_string($input);
}
return $output;
}
Upvotes: 0
Views: 6692
Reputation: 1677
While you don't have to sanitize your own string data that you display in the browser or store in a database, you must sanitize all user input that your website obtains through INPUT elements, TEXTAREA elements, from the keyboard via JavaScript/DOM Events, from uploaded files, and from all the other sources I've forgotten to list.
While database sanitizing is well-documented, and partially enforced in the latest version of server-side languages like PHP, there is still no universally-accepted way to sanitize the other sources of user input that I listed.
My own contribution is this little piece of PHP code, that allows any user input to be displayed on a web page or sent to another web page through GET or POST controls and fields in FORM elements or through Ajax without opening your website to malicious use:
function HTMLToSafeHTML($Str)
{
return str_replace(['&','<','>','"','\''], ['&','<','>','"','''], $Str);
} // HTMLToSafeHTML
To use this function correctly, you must identify and track all user input, then call this function before displaying or otherwise allowing the user input to be interpreted as part of Web processing or programming. Identifying user input allows you to call this function only once. Calling it more than once will display its hard-to-read encoding, which is not useful as text.
For example, if you want to display an error message that shows some user input in boldface, you have to call HTMLToSafeHTML (which you can give a shorter name) on the user input before enclosing it in <strong>...</strong>
to make it boldface. While it is harmless to display "<strong>
", it is anything but harmless to display user input that might be the result of malicious users trying quite deliberately to break into your website in order to spread a virus or for some other evil purpose.
Upvotes: 0
Reputation: 31641
Quite easily:
$testinput = "<script>alert('p0wned');</script >\n
<a href='http://example.org' onclick=\"alert('p0Wned again!)\">Click me!</a>";
var_export(cleanInput($testinput));
Also, htmlescape
is almost always the wrong thing to use--it will mangle utf8 input. Also, you should not be storing html-escaped data in your DB. I'm not even sure why you use it here at all--won't you have to unescape the html to display it?
However you are going about this the wrong way.
DOMDocument
or html5lib
or even tidylib
. Unfortunately PHP doesn't seem to have anything as wonderful as Bleach on Python, so you will have to roll your own. An XSLT stylesheet with a whitelist seems like it might be a good way to handle this particular sanitization condition. Update: another user pointed out HTML Purifier, which is also a whitelist-based html sanitizer. I've never used it but it looks like "Bleach in PHP". You should definitely investigate.A general outline of processing is like so:
Input
if (get_magic_quotes_gpc()) die ('TURN OFF MAGIC QUOTES!!!!');
PDO
library with prepared statements. This way you do not need to remember to escape data by hand.Output
Escape your data inside your template. Individual fields of your data will need to be escaped differently. You almost always need to run it through htmlspecialchars
before output; the only case you would not do that is when the data you need to display is already html (i.e. your whitelist-sanitized html fields). Define a helper function like this and use it in your templates:
function h($str) {
return htmlspecialchars($str, ENT_QUOTES, 'utf-8');
}
Even better, try to use a template library that automatically escapes strings for you and that requires you to turn off escaping explicitly. (The common case should be simple to avoid errors, and having to escape is the common case!)
Upvotes: 1