Terry
Terry

Reputation: 66123

A curious case of character encoding, PHP to MySQL and back

I have been stumped by an issue. It seems that most of the tricks that I have tried simply do not work. An overview of the problem is as follow:

  1. Create table, collation set to utf8_unicode_ci. Same for columns.

  2. Page where form is located in has a character encoding of UTF-8 (in the <meta> tag). Form is set to accept character set of UTF-8 (<form action="execute.php" method="POST" accept-charset="utf-8">)

  3. Execute.php sanitizies form input using htmlspecialchars(@trim($str), ENT_QUOTES, "UTF-8"); and also runs mysql_real_escape_string($str);. Declare database connection should be encoded in UTF-8 (mysql_set_charset('utf-8');). Insert values into db. If I halt the database insert and echo the query, I get normal looking output.

  4. Now the fun begins. MySQL rows display odd characters, e.g. ß turning into ß.

  5. If I retrieve the database data and present it on a page with UTF-8 encoding, the characters look jumbled (ß), too. However, when I change the page encoding to Western ISO, the character display just fine - ß.

I am suspecting that there is a problem when the form submits the data to the database... but I can't pinpoint where exactly went wrong.

Upvotes: 0

Views: 341

Answers (3)

Mike Brant
Mike Brant

Reputation: 71384

You need the character set of the database table and columns set to UTF-8 as well. Collation only deals with how data will be sorted/compared not how it is encoded.

Upvotes: 0

Matt S
Matt S

Reputation: 15374

A few things

  1. Do not run your posted data through htmlspecialchars or any sanitation. Validate input, but store it as-is if it's valid.
  2. Sanitize output if needed, e.g. with htmlspecialchars.
  3. Be sure to only use binary-safe functions on UTF8 strings. Unlikely you'll run into this with modern PHP but it's possible.
  4. Stop using the deprecated mysql library and switch to mysqli (easy) or PDO.
  5. SET NAMES utf8 after you initialized your database connection with mysqli.
  6. Make sure your PHP file (or any used) are saved as UTF8.
  7. Set your response header to UTF8: header('Content-type: text/html; charset=utf-8'); You can do this from anywhere in your code if you're using output buffering.
  8. Add <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> to page <head>.

Upvotes: 3

poudigne
poudigne

Reputation: 1766

Did you try without the htmlspecialchars() ?

Upvotes: 0

Related Questions