Filippo oretti
Filippo oretti

Reputation: 49817

Remove or Encode Non-UTF-8 Characters

Is there a function to remove all non UTF-8 characters from a string?

Upvotes: 2

Views: 14840

Answers (1)

Pekka
Pekka

Reputation: 449385

If you have a UTF-8 string that might contain invalid characters, you can use iconv to remove those. This should work:

$text = iconv("utf-8", "utf-8//ignore", $text);

Making them visible with an arbitrary placeholder is a bit tougher - I can't think of any easy way to do that, short of walking through every byte and see whether it's a valid character. The Wikipedia article provides more info on how to do that.

Upvotes: 9

Related Questions