Ethan
Ethan

Reputation: 60179

How can I process data to avoid MySQL "incorrect string value" error?

I am trying to use a Rake task to migrate some legacy data from MS Access to MySQL. I'm working on Windows XP, using Ruby 1.8.6.

I have the encoding for Rails set as "utf8" in database.yml.

Also, the default character set for MySQL is utf8.

99% of the data is coming in fine, but every now and then I'll get a column value that gives me a error something like this:

Mysql::Error: Incorrect string value: '\x92 Comm...' for column 'name' 
  at row 1: 
  INSERT INTO `organizations` ( [...] ) 
  VALUES('Lawyers’ Committee', [...] )

It looks as though the thing that's giving MySQL trouble is the apostrophe immediately after the "s" in the word "Lawyers".

Here's another one...

Mysql::Error: Incorrect string value: '\x99 aoc' for column 'department' 
  at row 1: 
  INSERT INTO `addresses` 
[...]
  'TRInfo™ aoc'
[....]

Looks like it's choking on the "TM" after "TRInfo".

Is there any Ruby or Rails method that I can run the data through to cleanse from it any characters that MySQL will choke on?

Ideally, it would be great to replace them with more palatable characters -- replace the apostrophe with a single quote and the TM symbol with the string "(TM)".

Or, if I could somehow configure MySQL to store those characters as-is without errors that would be great too.

Upvotes: 4

Views: 19475

Answers (7)

IgorAlves
IgorAlves

Reputation: 5550

I had the same issue importing data from SQL Server to MySql using Php. My solution was utf8_encode() when inserting into MySql and use utf8_decode() when retrieving from MySql to display into the browser. Here you have my FULL code, that works good.

//For string values
$Gro2=(is_null($row["GrpNm"]))?"NULL":"\"".mysql_escape_string(utf8_encode($row["GrpNm"]))."\"";


$sqlMy ="INSERT INTO `tbl_name` VALUES ($Gro2)";

Please note: For new projects use

mysqli_escape_string()

link

Upvotes: 0

Richardhe2007
Richardhe2007

Reputation: 389

Adding binary before the weirdcolumn solves the problem.

In my case, I have an update trigger on tableA to insert data into other table. There are some special characters in column weirdcolumn, and the update failed with message: "ERROR 1366 (HY000): Incorrect string value: '\xE7....'"

After I dig in a lot, I found the solution by adding binary before the string column name, or using cast(weirdcolumn as binary);

Hope this can help.

Upvotes: 0

Zhifeng Hu
Zhifeng Hu

Reputation: 1271

I encountered the same problem today.
After tried many times, I found out the reason and fix it at last.
For applications that store data using the default MySQL character set and collation (latin1, latin1_swedish_ci), so you need to specify the character set and collation to utf8/utf8_general_ci when your create your database or table.
e.g.:
        $sql = "CREATE TABLE " . $table_name . " (
        id mediumint(9) NOT NULL AUTO_INCREMENT,
        bookname varchar(128) NOT NULL,
        author varchar(64) NOT NULL,
        PRIMARY KEY  (id),
        KEY (bookname)
        )CHARACTER SET utf8 COLLATE utf8_general_ci;";

Reference:
《mysql create table problem? SOLVED!!!!!!!!!!!》
http://forums.mysql.com/read.php?121,193883,193883
《10.1.5. Configuring the Character Set and Collation for Applications》
http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html

Hoping this can help you.

Upvotes: 0

Ondra Žižka
Ondra Žižka

Reputation: 46914

In general, this happens when you insert strings to columns with incompatible encoding/collation.

I got this error when I had TRIGGERs, which inherit server's collation for some reason. And mysql's default is (at least on Ubuntu) latin-1 with swedish collation. Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:

/etc/mysql/my.cnf :

[mysqld]
character-set-server=utf8
default-character-set=utf8

And this must list all triggers with utf8-*:

select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS

And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):

show variables like 'char%';

Upvotes: 1

lex82
lex82

Reputation: 11317

I had the same problem when putting contents of UTF-16 encoded files - which usually store one character per 16bit block - into mysql tables with java. The problem was that the UTF-16 encoded string contained so called surrogate pairs. It means two consecutive 16bit UTF-16 blocks encode one special character but cannot be translated into a corresponding UTF-8 encoding individually. See wikipedia for further explanation.

The solution was to simply replace these characters with spaces. This is the character range you might want to strip out of your string: U+D800–U+DFFF

Upvotes: 1

Kathy Van Stone
Kathy Van Stone

Reputation: 26291

It looks like your input data is not in utf-8.

I did a little investigating and the styled quote used in Lawyer's is encoded as \x92 in the Windows-1252 encoding, but would be nonsense for utf-8 (when I decoded it and encoded it into utf8, I got \xe2\x80\x99).

Thus you will need to convert the input strings from windows-1252 to utf-8 (or to unicode).

Upvotes: 5

Todd Gardner
Todd Gardner

Reputation: 13521

It looks like your old database is in one string format (utf8?) and your rails is expecting something else. If you input is in utf8, have you tried configuring your rails to support it?

Upvotes: 0

Related Questions