PETER BROWN
PETER BROWN

Reputation: 550

Problem encoding UTF8 data from Rails app to Mysql

I'm having trouble saving UTF8 data in a form and having it correctly saved in mysql. In particular, via my ruby application I'm post a form that includes the following:

Gerhard Tröster

Which in my terminal I see is being updated in the database as:

UPDATE `xxxx` SET 
   `updated_at` = '2009-08-13 14:22:33', 
   `description` = '<p><span style=\"font-size: 14px; line-height: normal; white-space: pre; \">Gerhard Tr?ster</span></p>' 
WHERE `id` = 1228

However when I select from this table it says:

| description |
---------------
| Gerhard Tr | 

Note that it's simply truncating everything AFTER the umlaut, even though the insert appears to have included it (or something like it).

My database.yml has encoding set to UTF8, I've included the appropriate META tags in my HTML as well.

Upvotes: 0

Views: 2713

Answers (5)

user144457
user144457

Reputation:

Although it was already mentioned above:

Putting encoding: utf8 in database.yml solved it for me.

Upvotes: 1

sloser
sloser

Reputation: 301

To make Ruby itself a little bit Unicode-aware, you need this line:

$KCODE = 'u'

I always put this line in config/environment.rb

And your database must be created with utf8 collation and you must have encoding set to UTF8 in database.yml.

Upvotes: 0

Julik
Julik

Reputation: 7856

These issues are symptomatic of a few possible problems. Mostly nothing to do with Ruby.

1) Your form gets sent with an Accept-Charset different from UTF-8. This will happen if

  • the page the form gets sent from is itself not UTF-8, by meta tag or HTTP header (a form from a Latin-1 page will be Latin-1)
  • The form explicitly specifies that it is sent as something other than UTF-8
  • You are using Javascript to post the data and not escaping correctly, or your users do

In this case the browser might be downgrading Unicode to the charset it can send. In general, the assumed accept-charset of the form is the charset of the page that displays the form in the first place.

2) Your MySQL server is configured in a manner that proactively obstructs you from using UTF-8 for data storage, so MySQL silently downgrades your UTF to something else (say MySQL is forced to do SET NAMES SOME_CRAPPY_8BIT_CHARSET_OF_1990 on every connection, by the server admin. No joke - this happened to me once). Read this article which explains how to hardwire everything for UTF-8 with 100% certainity http://www.fngtps.com/2007/02/ruby-and-mysql-encoding-flakiness

3) Your terminal that you are looking at is not showing you UTF-8 and tries to recode it into Latin or ASCII, dropping characters it cannot display and replacing them with "?" (standard pattern). If you do "puts 'ü'" in plain Ruby with $KCODE set what do you see? Windows terminals are especially susceptible to this kind of behavior before special settings are in place.

4) You are running Ruby 1.9 whose handling of Unicode is a special matter altogether

5) Totally unlikely but who knows: you are using (or your hoster is using) some crappy proxy solution which mangles your charset headers or recodes the input being sent. I can bet on 2 and 3 with about 50% chance.

Upvotes: 1

insane.dreamer
insane.dreamer

Reputation: 2072

There are (amazingly) four places you need to set the UTF-8 encoding in order to ensure your data gets saved in that format in mysql (why they don't use utf-8 as the default is beyond me): The connection, the database, the table and the columns. Specifying utf-8 in your database.yml takes care of the connection, the other three have to be set in mysql (using the caracter set, collate and set names commands).

Just for good measure, you might also need to add a utf-8 directive to your html headings, and in your environment; to make sure that it "takes" across the board.

Some helpful info here: http://word.wardosworld.com/?p=164

Upvotes: 2

markus
markus

Reputation: 40675

The question mark in the db entry means it hasn't been updated correctly as utf8. You need to make sure that the db tables and columns have utf8 collation and that you set the connection to utf8 too. To ensure that you can use the mysql query SET NAMES 'UTF-8'.

(Furthermore I'm wondering why you're storing all this markup in your db?)

Upvotes: 1

Related Questions