Brent Newey
Brent Newey

Reputation: 4509

Windows 1252 Data in UTF-8 MySQL Table Using CakePHP

I see versions of this question a lot on SO, so I'll try to be explicit about what is happening here.

We have a Cake 1.2.5 app connected to a MySQL 5.1 database. The original database and table that I am trying to write to were Latin-1 but I changed the database, table, and column to all be UTF-8 (from what I understand this doesn't really matter, but I'm including it for completeness' sake).

The problem is that a Windows user who puts an en dash into our form (obtained by having MS Word auto correct a hyphen), ends up with byte x96 in the database (viewed by using the hexl-mode hex editor in Emacs), which is the code point for en dash in the Windows 1252 encoding (and pretty much invalid in other common encodings).

Originally I thought this was a problem with the form input, so I did the usual round of changing the Content-Type header, checking the meta tags, adding accept-charset to the form tag, none of which did anything, but then I tried dumping the data I was getting from the form to a file before saving it to the database, and it correctly saves the UTF-8 code point for en dash xe2x80x93 (viewed in the same way), so I believe the problem is occuring when Cake talks to the database.

Things I have tried:

In addition to possible answers, I'm interested in hearing about any assumptions I've made in this process that are invalid, as well as methods for viewing the state of the data at various stages during the process.

Upvotes: 1

Views: 1393

Answers (2)

Brent Newey
Brent Newey

Reputation: 4509

The answer to this question turned out to be a problem with the character_set_client setting of all the clients I was using (MySQL command line client, Emacs SQL Mode [which is really just a wrapper for MySQL command line client], and Python's MySQLDb library) to view the data after it had been inserted.

After running the command SHOW variables; it became apparent that the data was in fact in the database correctly, but all my efforts to observe the data were incorrect.

Upvotes: 0

ajreal
ajreal

Reputation: 47321

if u just change the table schema from latin1 to UTF-8, this probably not working well if your existing data contains UTF-8 characters. not quite sure about cakephp, had u check this too mysql_set_charset ? http://php.net/manual/en/function.mysql-set-charset.php

Upvotes: 1

Related Questions