PHP: Multibyte unicode convertion

Question

I've been googling for a bit, also search here but can find a solution. I'm using PHP. I'm reading a text string (part of X509 cert) and it encoded é to \xC3\xA9 (André => Andr\xC3\xA9).

I've tried MonkeyPhysics's solution:

preg_replace("#(\\x[0-9A-F]{2})#ei", "chr(hexdec('\1'))", $string);

but then I get AndrÃ©

I've played around with the replacement part;

mb_convert_encoding('&#' . hexdec('\1') . ';', 'ISO-8859-1', 'UTF-8')

(Also the to_encoding and from_encoding)

I've also looked at How to transliterate non-latin scripts? but got no closer.

Surely this should be a standard conversion?

anubhava · Accepted Answer

Use of e modifier is deprecated in PHP now. You need to use preg_replace_callback instead with /u modifier for handling unicode strings.

$string = 'His nickname was \xE2\x80\x98the Angel\xE2\x80\x99,
which is kind of a clich\xC3\xA9 in my opinion.';

$repl = preg_replace_callback("#(\\x[0-9A-F]{2})#ui",
           function ($m) { return chr(hexdec($m[1])); }, $string);

OUTPUT:

His nickname was ‘the Angel’,
which is kind of a cliché in my opinion.

PHP: Multibyte unicode convertion

Answers (1)

OUTPUT:

Related Questions