Reputation: 1600
In a description getting characters like "�".
To convert those I tried with utf8_encode() it is converting this charterers to another weird pattern.Also tried with regx and by setting character it didnt work.
Any quick solution to solve the issue?
Thanks.
Upvotes: 0
Views: 225
Reputation: 3881
Most likely, your string contains characters encoded using the UTF-8
character set. UTF-8 has some multibyte characters. For example, the
Euro symbol €
is represented in UTF-8 with the three bytes E2,
82, AC
.
But your software is interpreting the string using a one-byte
encoding, such as ISO-8859-1. This causes each byte of the 3-byte
character to be interpreted as a separate character. E2
, for
example, is being displayed as â
, when it is actually only the
first byte of a 3-byte character.
utf8_encode() is not the solution to this. It takes an ISO-8859-1 encoded string and returns a UTF-8 string. You already have a UTF-8 string.
You have a couple of options.
One, fix whatever uses the string so that it expects the string to contain UTF-8. That will properly preserve the characters that are in the string. For example, if you are writing the string as part of a web page, ensure that the webpage's character encoding is UTF-8.
Two, convert the string to whatever encoding you are actually using. For example, you can convert the string from UTF-8 to ISO-8859-1 with utf_decode(). The disadvantage is that ISO-8859-1 cannot represent as many different characters as UTF-8, so some characters will simply be lost in the decoding.
Upvotes: 2
Reputation: 928
Try this function that I wrote when I was dealing with utf8
function removeuni($content){
preg_match_all("/[\x{80}-\x{3000}]/u", $content, $matches);
foreach($matches[0] as $match){
$content = str_replace($match, mb_convert_encoding($match, "HTML-ENTITIES","UTF-8"), $content);
}
return $content;
}
Upvotes: 0