Reputation: 33
I've developed an PHP/MySQL-application where in one table names are stored. These names sometimes contain special characters (like é, à, ë, ...).
When creating the table I had forgotten to set the collocation-item to UTF-8 and now is set to LATIN1_SWEDISH_CI. So some data isn't displayed correct in phpMyAdmin. But when I show the names on a PHP-page, those special characters are displayed correctly. Here's an extract from a PHP-file where I use UTF-8
<?php ... ?>
<html>
<head>
<meta http-equiv="Content-Type" content-"text/html; charset="UTF-8">
....
Like I said the special characters are displayed as it should. So far... no problem.
But now I would like to export that data into an CSV-file and guess what? The special characters aren't included in the CSV-file. My PHP-export-file contains the following lines of code:
<?php
mysql_query("SET NAMES utf8");
header('Content-Type: text/html; charset=UTF-8');
...
But no special characters are displayed?
Does anyone have a solution for this problem? Because I find it a little ridiculous to open the CSV in Excel and use 'Find & Replace'. Using the HTML escape-codes is out of the question. That's why there's UTF-8, not?
Upvotes: 2
Views: 1725
Reputation: 22350
You have stored UTF-8 encoded data which MySQL regards as Latin-1 data. MySQL does not complain about this because any arbitrary sequence of bytes is valid Latin-1. Because the connection character set of the connection used to retrieve the data is the same as that used to insert it, the correct data is displayed on your web page. But if you view the data in a utility that takes pains to display the actually stored characters, you will see mis-encoded text, because that is what you actually have stored.
There are two things you need to do: firstly, you need to change your database connection code to make sure that all connections you make to your database are using the UTF-8 character set. This can be accomplished using a settings file or just by issuing a SET NAMES statement every time you connect.
Secondly, you need to correct the mis-encoded data already stored in the database. Do not alter table to change the character set to UTF-8 directly; if you do, you will end up with double-UTF-8-encoded data. Instead, use an alter table query to change the column to the binary character set, and after doing that, alter table again to UTF-8.
Upvotes: 2