mushfiq
mushfiq

Reputation: 1600

PHP unwanted characters removing solution

In a description getting characters like "�".

To convert those I tried with utf8_encode() it is converting this charterers to another weird pattern.Also tried with regx and by setting character it didnt work.

Any quick solution to solve the issue?

Thanks.

Upvotes: 0

Views: 225

Answers (2)

MetaEd
MetaEd

Reputation: 3881

Most likely, your string contains characters encoded using the UTF-8 character set. UTF-8 has some multibyte characters. For example, the Euro symbol is represented in UTF-8 with the three bytes E2, 82, AC.

But your software is interpreting the string using a one-byte encoding, such as ISO-8859-1. This causes each byte of the 3-byte character to be interpreted as a separate character. E2, for example, is being displayed as â, when it is actually only the first byte of a 3-byte character.

utf8_encode() is not the solution to this. It takes an ISO-8859-1 encoded string and returns a UTF-8 string. You already have a UTF-8 string.

You have a couple of options.

One, fix whatever uses the string so that it expects the string to contain UTF-8. That will properly preserve the characters that are in the string. For example, if you are writing the string as part of a web page, ensure that the webpage's character encoding is UTF-8.

Two, convert the string to whatever encoding you are actually using. For example, you can convert the string from UTF-8 to ISO-8859-1 with utf_decode(). The disadvantage is that ISO-8859-1 cannot represent as many different characters as UTF-8, so some characters will simply be lost in the decoding.

Upvotes: 2

Noodles
Noodles

Reputation: 928

Try this function that I wrote when I was dealing with utf8

function removeuni($content){
  preg_match_all("/[\x{80}-\x{3000}]/u", $content, $matches);

  foreach($matches[0] as $match){
    $content = str_replace($match, mb_convert_encoding($match, "HTML-ENTITIES","UTF-8"), $content);
  }

  return $content;
}

Upvotes: 0

Related Questions