Reputation: 34006
For some reason my special characters got encoded as the following string in a mysql database:
Ã?
Which shows up as:
Ã?
But actually should show up as:
Ö
What went wrong here? I use UTF-8 everywhere.
How can I fix this without recreating all content?
Upvotes: 0
Views: 232
Reputation: 142218
For PDO, use something like
$db = new PDO('dblib:host=host;dbname=db;charset=UTF-8', $user, $pwd);
Ã?
is two or three things going wrong, not just one!
C396
is the utf8 hex for Ö
or the latin1 hex for the two characters Ö
. It requires something else to go wrong to get ?
or the black diamond.
Let's see what is in the table; do
SELECT col, HEX(col) FROM tbl WHERE ...
(If you have already done the previously suggested replace()
, then the table may be in an even worse mess. Or it might be fixed.)
Upvotes: 0
Reputation: 350137
I executed the following in PHP
:
<?php
echo str_replace("&", "&", htmlentities("Ö", 0, "ISO-8859-1")) , '<br />';
echo str_replace("&", "&", htmlentities("Ö", 0, "UTF-8")), "</br>";
?>
The str_replace
is just there to reveal any HTML
mnemonics, which would otherwise
be translated by the browser to the original character, which I don't want to happen.
You will get this as output:
�
Ö
You'll recognise the first value as what you found in the database, and the second one
is a bit like you wanted it to be.
Add to this the fact that the default value for the third argument to htmlentities
depends on your PHP
version and is ISO-9959-1
in the case of version 5.3, the one you use.
Also realise that HTML
documents which do not specify a character encoding will
by default post form data in ISO-8859-1
format.
Combining all this might give a clue about the cause of your problem:
My guess is that the data is correctly posted as UTF-8
to the server, but then htmlentities
interprets this as a non-UTF-8
, single byte encoding, and so turns one, multi-byte character into two single byte characters.
Now to the measures to take that this does not continue to happen:
First make sure that your HTML
form has the UTF-8
encoding, because this determines the
default encoding that a form will use for sending its data to the server:
<head>
<meta charset="UTF-8">
</head>
Make sure this is not overruled by another encoding in the form tag's accept-charset
attribute.
Then, skip the htmlentities
call. You should not turn characters into their
HTML mnemonic
when storing them in the database. MySql
supports UTF-8
characters, so just store them like that.
For the second question, you'll have to find all cases and bulk replace them as you find
new instances. You could get get a little help by producing some SQL
statements
with a PHP
script like the following:
<?php
// list all your non-ASCII characters here. Do not use str_split.
$chars = ["Ö","õ","Ũ","ũ"];
foreach ($chars as $ch) {
$bad = str_replace("&", "&", htmlentities($ch, 0, "ISO-8859-1"));
echo "update mytable set myfield = replace(myfield, '$bad', '$ch')
where instr(myfield, '$bad') > 0;<br />";
}
?>
The output of this script will look like this:
update mytable set myfield = replace(myfield, 'Ã�', 'Ö') where instr(myfield, 'Ã�') > 0;
update mytable set myfield = replace(myfield, 'õ', 'õ') where instr(myfield, 'õ') > 0;
update mytable set myfield = replace(myfield, 'Ũ', 'Ũ') where instr(myfield, 'Ũ') > 0;
update mytable set myfield = replace(myfield, 'Å©', 'ũ') where instr(myfield, 'Å©') > 0;
Of course, you could decide to make a PHP
script that will even do the updates itself.
Hopefully you can use this information to fix the issues.
Upvotes: 2