Reputation: 3311
Content people have been using Word and pasting things into the old unicode system. I'm now trying to go UTF8.
However, upon importing the data there are characters I cannot get rid of.
I have tried the following stackoverflow thread and none of the functions provided fix this string: http://snipplr.com/view.php?codeview&id=11171 / How to replace Microsoft-encoded quotes in PHP
String: Danâ??s back for more!!
Upvotes: 0
Views: 637
Reputation: 401182
In this kind of situation, I generally start with the string I have copy-pasted from word :
$str = 'Danâ’s back !';
var_dump($str);
And, going byte-by-byte in it, I output the hexadecimal code of each byte :
for ($i=0 ; $i<strlen($str) ; $i++) {
$byte = $str[$i];
$char = ord($byte);
printf('%s:0x%02x ', $byte, $char);
}
Which gives an output such as this one :
D:0x44 a:0x61 n:0x6e �:0xc3 �:0xa2 �:0xe2 �:0x80 �:0x99 s:0x73 :0x20 b:0x62 a:0x61 c:0x63 k:0x6b :0x20 !:0x21
Then, with a bit of guessing, luck, and trial-and-error, you'll find out that :
â
is a character that fits on two bytes : 0xc3 0xa2
0xe2 0x80 0x99
Hint : it's easier when you don't have two special characters following each other ;-)
After that, it's only a matter of using str_replace to replace the correct sequence of bytes by another character ; for example, to replace the special-quote by a normal one :
var_dump(str_replace("\xe2\x80\x99", "'", $str));
Will give you :
string 'Danâ's back !' (length=14)
Upvotes: 3