RedDragon
RedDragon

Reputation: 2138

Replacing weird chars in MySQL

Some of my texts are being displayed weird and I need to replace some chars on it. However I am having trouble with a specific char, the following one (javascript code, to show the difference between chars):

<script>
alert('–'.charCodeAt(0) + ':' + '-'.charCodeAt(0));
</script>

In MySQL I tried to apply the following query:

UPDATE translation SET columnx = REPLACE(columnx, '–', '-');

But it affects 0 rows. Therefore the question is, what is the right query to replace these weird chars to the correct one?

UPDATE

The weird char is displayed like this (the square):

Weird char, displayed as square

In the JSON, it is encoded as \u0096 instead of -

Upvotes: 4

Views: 2957

Answers (2)

Michel Feldheim
Michel Feldheim

Reputation: 18250

This looks like not charset but collation related. A collation defines how MySQL treats chars "almost equal" when it comes to sorting or comparation.

For example the iso-8859-15 default collation will treat ü = u

What you can do is to treat your field like a bin collation. A binary collation does not treat similar characters equal.

Select your correct binary collation

SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLLATIONS WHERE COLLATION_NAME LIKE '%bin%';

Then execute your update like this:

UPDATE TABLE SET columnx = REPLACE( columnx COLLATE latin1_bin, '–', '-' );

CORRECTION: REPLACE compares are always done using a binary collation

EDIT:

If you still get 0 rows updated you probably don't replace the correct character. Convert a string containing the character to hex and post the hex value so we can find out which char we are talking about

e.g.

SELECT HEX( columnx ) LIMIT 1;

EDIT2:

Just seen, you actually said you get \u0096 which is a control character called START OF GUARDED AREA.. what ever program this character creates.. in hex this is 0xC2 0x96. In your example query you are replacing a characer called EN DASH

It's hard to replace a control character by just pasting it, conversions might break it up. Instead you can use UNHEX( hexval ) to tell MySQL which character you mean

UPDATE TABLE SET columnx = REPLACE( columnx UNHEX( 'C296' ), '-' );

or to make this more clear (or even more confusing :)), this passes the "normal" hypen as hex value as well

UPDATE TABLE SET columnx = REPLACE( columnx UNHEX( 'C296' ), UNHEX( '2D' ) );

Upvotes: 2

Ace7k3
Ace7k3

Reputation: 424

Like Alvaro said, you should really try to change your database to the correct character set. Usually the utf-8 character set should be enough.

For further information look here: http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html

If you have no rights to do so maybe take a look at: http://dev.mysql.com/doc/refman/5.1/de/charset-convert.html and https://dba.stackexchange.com/questions/9944/mysql-transfer-iso-8859-1-to-utf-8

Upvotes: 3

Related Questions