Reputation: 4509
I see versions of this question a lot on SO, so I'll try to be explicit about what is happening here.
We have a Cake 1.2.5 app connected to a MySQL 5.1 database. The original database and table that I am trying to write to were Latin-1 but I changed the database, table, and column to all be UTF-8 (from what I understand this doesn't really matter, but I'm including it for completeness' sake).
The problem is that a Windows user who puts an en dash into our form (obtained by having MS Word auto correct a hyphen), ends up with byte x96
in the database (viewed by using the hexl-mode hex editor in Emacs), which is the code point for en dash in the Windows 1252 encoding (and pretty much invalid in other common encodings).
Originally I thought this was a problem with the form input, so I did the usual round of changing the Content-Type header, checking the meta tags, adding accept-charset to the form tag, none of which did anything, but then I tried dumping the data I was getting from the form to a file before saving it to the database, and it correctly saves the UTF-8 code point for en dash xe2x80x93
(viewed in the same way), so I believe the problem is occuring when Cake talks to the database.
Things I have tried:
'encoding'=>'utf8'
to the connection definition in app/config/database.php.Configure::write('App.encoding', 'UTF-8');
to app/config/core.php.mb_internal_encoding('UTF-8');
to app/config/core.php.In addition to possible answers, I'm interested in hearing about any assumptions I've made in this process that are invalid, as well as methods for viewing the state of the data at various stages during the process.
Upvotes: 1
Views: 1393
Reputation: 4509
The answer to this question turned out to be a problem with the character_set_client setting of all the clients I was using (MySQL command line client, Emacs SQL Mode [which is really just a wrapper for MySQL command line client], and Python's MySQLDb library) to view the data after it had been inserted.
After running the command SHOW variables;
it became apparent that the data was in fact in the database correctly, but all my efforts to observe the data were incorrect.
Upvotes: 0
Reputation: 47321
if u just change the table schema from latin1 to UTF-8, this probably not working well if your existing data contains UTF-8 characters. not quite sure about cakephp, had u check this too mysql_set_charset
? http://php.net/manual/en/function.mysql-set-charset.php
Upvotes: 1