Reputation: 493
I have a mysql database with questions and answers that are displayed in HTML paragraphs and buttons. The q&a contains lots of special characters eg é,...,',",ö and also some html tags like sup.
I have tried mysqli_real_escape_string, htmlentities and adding backslashed but some characters always show incorrectly on the page. Sometimes it's correct in the paragraphs but incorrect on the buttons.
What is the correct function to use to make all these special characters display correctly and when should I use it (when inserting into the database or when selecting from the database/making it into HTML?
Many thanks
Upvotes: 2
Views: 17300
Reputation: 21659
Character sets and collation
As others have stated, one of your problems could be down to character sets and collation. You need to ensure that the whole chain (input, storage and output) is correctly configured to handle the characters that you are using. UTF-8 is often a good choice, as it can handle every character in the Unicode character set.
To create a MySQL database or table using UTF-8 with case-insensitive collation:
CREATE DATABASE mydb
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
CREATE TABLE mytable ( ... )
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
Escaping
mysql_real_escape_string (I'm assuming that you are using PHP) is used to help the MySQL parser distinguish between your parameters and SQL keywords. It is used when the whole SQL command is supplied as a single string:
INSERT INTO mytable VALUES ("this \" is a double quote");
The backslash is required to help MySQL understand that the double quote in the middle of the string is in fact a literal double quote in the middle of the string, and not a closing double quote.
By escaping your data before inserting it into the database, you are directly altering that data: you are no longer storing the original data, and therefore have to process it again when you retrieve it from the database (to un-escape it).
Prepared statements
To make things easier, for both you and Mysql, you can use prepared statements instead. Prepared statements use placeholders to show MySQL exactly which parts of the SQL statement are your parameters:
$stmt = $dbh->prepare("INSERT INTO mytable VALUES (?)");
$stmt->execute(array('this " is a double quote'));
By using prepared statements, you can insert your data into the database unmodified - no messy escaping is required. This has the added advantage of significantly reducing the possibility of SQL injection. See Bill Karwin's Sql Injection Myths and Fallacies talk and slides for more information on this subject.
Output
Now that your data is stored in its original format, you are free to output it however you wish. If you are outputting HTML (to be displayed as literal HTML), then you will need to escape it prior to output. There are a number of ways to do this, including htmlspecialchars and HTML Purifier. Which method you choose depends on the source of your data, and exactly how you want it to be displayed.
Upvotes: 4
Reputation: 3905
I suspect the problem is in the character sets in use. Your mysql collation needs to support the characters you are trying to use, and your webpages need to be in a matching character set.
Most likely, your MySQL database is using a collation such as latin1_general_ci, while your webpages are supposedly being displayed in UTF-8. I would suggest you set MySQL to store data in UTF-8, and your web pages should output a header stating they are UTF-8.
Upvotes: 2
Reputation: 2620
If you make sure your database encoding and page encoding are UTF-8 then this should help most of the way along.
Upvotes: 1