Reputation: 2138
Some of my texts are being displayed weird and I need to replace some chars on it. However I am having trouble with a specific char, the following one (javascript code, to show the difference between chars):
<script>
alert('–'.charCodeAt(0) + ':' + '-'.charCodeAt(0));
</script>
In MySQL I tried to apply the following query:
UPDATE translation SET columnx = REPLACE(columnx, '–', '-');
But it affects 0 rows. Therefore the question is, what is the right query to replace these weird chars to the correct one?
UPDATE
The weird char is displayed like this (the square):
In the JSON, it is encoded as \u0096
instead of -
Upvotes: 4
Views: 2957
Reputation: 18250
This looks like not charset but collation related. A collation defines how MySQL treats chars "almost equal" when it comes to sorting or comparation.
For example the iso-8859-15 default collation will treat ü = u
What you can do is to treat your field like a bin collation. A binary collation does not treat similar characters equal.
Select your correct binary collation
SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLLATIONS WHERE COLLATION_NAME LIKE '%bin%';
Then execute your update like this:
UPDATE TABLE SET columnx = REPLACE( columnx COLLATE latin1_bin, '–', '-' );
CORRECTION: REPLACE compares are always done using a binary collation
EDIT:
If you still get 0 rows updated you probably don't replace the correct character. Convert a string containing the character to hex and post the hex value so we can find out which char we are talking about
e.g.
SELECT HEX( columnx ) LIMIT 1;
EDIT2:
Just seen, you actually said you get \u0096
which is a control character called START OF GUARDED AREA.. what ever program this character creates.. in hex this is 0xC2 0x96
.
In your example query you are replacing a characer called EN DASH
It's hard to replace a control character by just pasting it, conversions might break it up. Instead you can use UNHEX( hexval ) to tell MySQL which character you mean
UPDATE TABLE SET columnx = REPLACE( columnx UNHEX( 'C296' ), '-' );
or to make this more clear (or even more confusing :)), this passes the "normal" hypen as hex value as well
UPDATE TABLE SET columnx = REPLACE( columnx UNHEX( 'C296' ), UNHEX( '2D' ) );
Upvotes: 2
Reputation: 424
Like Alvaro said, you should really try to change your database to the correct character set. Usually the utf-8 character set should be enough.
For further information look here: http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html
If you have no rights to do so maybe take a look at: http://dev.mysql.com/doc/refman/5.1/de/charset-convert.html and https://dba.stackexchange.com/questions/9944/mysql-transfer-iso-8859-1-to-utf-8
Upvotes: 3