TungstenX
TungstenX

Reputation: 880

PHP: Multibyte unicode convertion

I've been googling for a bit, also search here but can find a solution. I'm using PHP. I'm reading a text string (part of X509 cert) and it encoded é to \xC3\xA9 (André => Andr\xC3\xA9).

I've tried MonkeyPhysics's solution:

preg_replace("#(\\\x[0-9A-F]{2})#ei", "chr(hexdec('\\1'))", $string);

but then I get André

I've played around with the replacement part;

mb_convert_encoding('&#' . hexdec('\\1') . ';', 'ISO-8859-1', 'UTF-8')

(Also the to_encoding and from_encoding)

I've also looked at How to transliterate non-latin scripts? but got no closer.

Surely this should be a standard conversion?

Upvotes: 1

Views: 261

Answers (1)

anubhava
anubhava

Reputation: 785581

Use of e modifier is deprecated in PHP now. You need to use preg_replace_callback instead with /u modifier for handling unicode strings.

$string = 'His nickname was \xE2\x80\x98the Angel\xE2\x80\x99,
which is kind of a clich\xC3\xA9 in my opinion.';

$repl = preg_replace_callback("#(\\\x[0-9A-F]{2})#ui",
           function ($m) { return chr(hexdec($m[1])); }, $string);

OUTPUT:

His nickname was ‘the Angel’,
which is kind of a cliché in my opinion.

Upvotes: 1

Related Questions