shyam
shyam

Reputation: 6771

PHP utf encoding problem

How can I encode strings on UTF-16BE format in PHP? For "Demo Message!!!" the encoded string should be '00440065006D006F0020004D00650073007300610067006'. Also, I need to encode Arabic characters to this format.

Upvotes: 1

Views: 1047

Answers (2)

VolkerK
VolkerK

Reputation: 96159

E.g. by using the mbstring extension and its mb_convert_encoding() function.

$in = 'Demo Message!!!';
$out = mb_convert_encoding($in, 'UTF-16BE');

for($i=0; $i<strlen($out); $i++) {
  printf("%02X ", ord($out[$i]));
}

prints

00 44 00 65 00 6D 00 6F 00 20 00 4D 00 65 00 73 00 73 00 61 00 67 00 65 00 21 00 21 00 21 

Or by using iconv()

$in = 'Demo Message!!!';
$out = iconv('iso-8859-1', 'UTF-16BE', $in);

for($i=0; $i<strlen($out); $i++) {
  printf("%02X ", ord($out[$i]));
}

Upvotes: 0

Pascal MARTIN
Pascal MARTIN

Reputation: 401002

First of all, this is absolutly not UTF-8, which is just a charset (i.e. a way to store strings in memory / display them).

WHat you have here looks like a dump of the bytes that are used to build each characters.

If so, you could get those bytes this way :

$str = utf8_encode("Demo Message!!!");

for ($i=0 ; $i<strlen($str) ; $i++) {
    $byte = $str[$i];
    $char = ord($byte);
    printf('%02x ', $char);
}

And you'd get the following output :

44 65 6d 6f 20 4d 65 73 73 61 67 65 21 21 21 

But, once again, this is not UTF-8 : in UTF-8, like you can see in the example I've give, `D` is stored on only one byte : `0x44`

In what you posted, it's stored using two Bytes : 0x00 0x44.

Maybe you're using some kind of UTF-16 ?



EDIT after a bit more testing and @aSeptik's comment : this is indeed UTF-16.

To get the kind of dump you're getting, you'll have to make sure your string is encoded in UTF-16, which could be done this way, using, for example, the mb_convert_encoding function :

$str = mb_convert_encoding("Demo Message!!!", 'UTF-16', 'UTF-8');

Then, it's just a matter of iterating over the bytes that make this string, and dumping their values, like I did before :

for ($i=0 ; $i<strlen($str) ; $i++) {
    $byte = $str[$i];
    $char = ord($byte);
    printf('%02x ', $char);
}

And you'll get the following output :

00 44 00 65 00 6d 00 6f 00 20 00 4d 00 65 00 73 00 73 00 61 00 67 00 65 00 21 00 21 00 21 

Which kind of looks like what youy posted :-)

(you just have to remove the space in the call to printf -- I let it there to get an easier to read output=)

Upvotes: 5

Related Questions