Reputation: 3948
I'm aware about the requirement of sanitizing entered or form-submitted data on public websites. However, there are so many documents about security on the web, I'm totally confused which route to go.
a) Currently, my MySQL tables have MyISAM format, most are encoded in utf8_bin
, other in latin1_swedish_ci
. I take that utf8_bin
is preferable, but can I safely convert them?
b) Currently, I have the following gigantic 'converter' for all data I receive via POST/GET/REQUEST:
foreach($_POST as $k=>$v){
if(ini_get('magic_quotes_gpc'))
$_POST[$k]=stripslashes($_POST[$k]);
$_POST[$k]=htmlspecialchars(strip_tags($_POST[$k]));
$_POST[$k]=utf8_decode($_POST[$k]);
}
and on top of that, in SQL queries I use mysql_real_escape_string
which bloats the source a lot, particularly for large forms.
Is there a way to optimize that (do I really need all these conversions?) and especially, how can I ensure that with foreign character sets (like Chinese etc.) my forms etc. don't get completely messed up? Do I have to convert the data back before display?
Upvotes: 0
Views: 226
Reputation:
Sanitizing your inputs before using them in a SQL command to a database is necessary, but it cannot prevent all forms of SQL Injection. The best strategy to prevent this is to make use of Parameterized Queries, which allow the database to distinguish between what is meant to be data and what is meant to be command, so even if a bad input sneaks in and SQL commands appear in the data, the database knows to treat them as data.
Here's a good writeup about how to do this with PHP/PDO.
More about this in the excellent OWASP SQL Injection writeup.
Upvotes: 2
Reputation: 8459
What you do to your data is too much there. You should modify your strings only for some usage as some operations are useless in some cases.
Upvotes: 0
Reputation: 197624
What you outline in your question is about multiple places that are related to input encoding and output encoding as well as database encoding.
Let's start with the very beginning, the input into your PHP application, namely $_POST
in your example. You can reduce the number of cases dramatically by first of all taking care that the host is properly configured where your application runs:
<?php
/* Prevent the application to run if magic quotes are enabled. */
if (ini_get('magic_quotes_gpc')) {
throw new Exception('Magic Quotes must be disabled.');
}
Then you pick the data from the $_POST
array and you perform multiple translations with them:
strip_tags
- removes "HTML tags"htmlspecialchars
- encodes HTML special charactersutf8_decode
- converts the character-encoding from UTF-8
to ISO-8859-1
It looks a bit that you just throw various functions on the input data a bit blindly probably.
I can't judge over your application however, so I can not specifically say, if you really need strip_tags
or htmlspecialchars
even.
Even strip_tags
is used for input filtering, the question is does it apply to your case? Let's say if there is a textfeld where users want to enter some text that might contain a tag, would it be a problem? Why remove it? Maybe the user wanted to input that value for a reason? So it's good to actually know why and when you need to run strip_tags
on input data.
The htmlspecialchars
function is normally used for output, not input, so it's not clear to my why you use it here.
That pair together - strip_tags
and htmlspecialchars
is normally better handled in the output. Some example of data that goes through an imaginary "My HTML favorite" application:
Request:
$_POST['text']: 'The tag I love most in HTML is <a>!';
Input Validation:
// This requires POST
if ($_SERVER['REQUEST_METHOD'] !== POST) {
--> Method not allowed.
}
// Specific values are required
if (!isset($_POST['text'])) {
--> Invalid Request
}
// Some requests are just too large:
if (strlen($_POST['text']) > 5000) {
--> the request is invalid. block it.
}
// The expected input encoding is UTF-8
// This example is rather broad, you might want to limit
// it to a subset of unicode characters instead.
if (!preg_match('/^.*$/su', $_POST['text'])) {
--> Invalid Request.
}
// The text field should not be larger than 2500 bytes
$input['text'] = $_POST['text'];
if (strlen($input['text']) > 2500) {
--> give error message to user, Request is Valid,
but there was a problem what the user did, so
you need to tell him.
}
Database:
$db = new DatabaseConnection('Encoding: UTF-8');
$row = $db->getTable('Texts')->newRow();
$row['text'] = $input['text'];
$row->insert();
Display the result to the user:
header('Content-Type: text/html; charset=utf-8');
You just posted: <?php echo htmlspecialchars($input['text']); ?>
As this example shows, the input validation you do needs to be specific to your case. You should know which input character-set you expect and then make your application deal with it.
Next to that, in this example strip_tags
using on the input would not be necessary here.
As this application works on a properly configured host, you don't find any strip_slashes
here as well.
The length-check shows that there is more to check than just the basics. It always depends on your needs and input should have a limit always. In this case there is a hard limit (5000) and a soft limit that will notify the user. E.g. the column in the database might have a specific size, so you couldn't store more anyway.
The database just does it's job. As the example shows, the data is just stored therein. That's why you should use a database layer of some kind, that takes care of it for you, so you don't have to do this in the many places in your scripts. If you don't know where to start, use the parametrized queries that PDO offers. PDO is a database abstraction in PHP you can use for your MySQL database.
Another important part is the output. You didn't name it in you example, I put it in here to show you where the place of htmlspecialchars
belongs to: It will take care that the tag in the users input is displayed properly on the website.
...
utf8_bin
, other inlatin1_swedish_ci
. I take thatutf8_bin
...
What you list here are collations, the only define how the data is compared if you sort it.
You are probably concerned about the encoding in the columns itself, which should be UTF-8 for text-fields if your application takes UTF-8 into - so the database can store all input.
The example you gave suggests that you make use of ISO-8859-1 and not UTF-8, so your database fields must not be UTF-8 - but they can.
You can use any encoding in the database columns as long as it allows you to store the data of your input encoding without loss. In your example, you can store your ISO-8859-1 input texts into UTF-8 database columns.
Upvotes: 1
Reputation: 63442
Use mysql_real_escape_string()
to sanitize data to be appended to an SQL query, and use htmlspecialchars()
to sanitize data before appending it to HTML.
Upvotes: 3