thekevinscott
thekevinscott

Reputation: 5413

UTF-8 encoding in a rails model

I have a MySQL database, set to use UTF-8.

In my database.yml, the database is set to utf8.

I am doing some HTML scraping and inserting into the MySQL database.

If I retrieve the HTML from the database in PHP, it correctly encodes all characters and produces fine input:

// code
$result = mysql_query("SELECT raw_html FROM pages WHERE id = 1");
echo mysql_result($result,0);

// output
Hawaiʻi.

And the output looks great. However, in rails, I get strange characters:

// code in the controller
@page = Page.find(params[:id])

// code in the view
<%= @page.raw_html %>

// output
Hawaiʻi

Is there somewhere else I need to force UTF-8? I've tried using the iconv library to no avail (unless I'm using it wrong).

UPDATE: I've reproduced the same problem when using the console. So:

Page.find(2).raw_html[91..94]

"Ê»"

The problem also occurs under the console (script/console) if that sheds any more light on the issue.

UPDATE 2: Okay, on further investigation I've realized I was doing something dumb. But it didn't fix it.

While the table was set to UTF8, the column was not. I've changed the column to be 'utf8_general_ci'. However (and this makes me think I'm screwing something basic up), this actually produces the correct result:

@raw_html = Iconv.conv('LATIN1','UTF-8',@page.raw_html[0..10000])

That comes out lovely. Unfortunately, if I run the whole page through, I get:

Iconv::IllegalSequence in PagesController#show 
"€²18″N<"...

So there's some other funky stuff going on in there. Could it be that I still have it 'latin' encoded, even though I've explicitly set both the table and the column to UTF-8 (and repopulated the HTML) ? I'm currently using the mysql2 gem as well, per Jeffrey's suggestion.

UPDATE 3: To clarify, I'm getting console errors as well. This is the command:

Page.find(2).raw_html[91..94]

And this is the response:

"Ê»"

Upvotes: 2

Views: 4340

Answers (3)

Richard
Richard

Reputation: 1162

In your database.yml add encoding: utf8 to each of your environment setups.

Upvotes: 5

Jeffrey W.
Jeffrey W.

Reputation: 4169

You might to switch to mysql2 :)

Set it both your gem file and database.yml

adapter: mysql2

gem "mysql2"

That should save you a lot of trouble :)

Upvotes: 2

Marc
Marc

Reputation: 448

Check that you have set the character encoding for the html page in your layout

If you are using HTML5, try adding this as the first line in your page

<meta charset="UTF-8">

For HTML 4, try adding this to the head section of the page

<meta http-equiv="Content-type" content="text/html;charset=UTF-8">

For XHTML pages, try

<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />

if you are serving with the text/html MIME type, and this

<?xml version="1.0" encoding="UTF-8"?>

as the very first line of the served file if its XHTML served as XML

Upvotes: 0

Related Questions