Ryan Kennedy
Ryan Kennedy

Reputation: 3615

PHP inserting meaningless characters into string

I'm working on an application where I need to be able to send a string from JavaScript to PHP. At first I had just sent it through $_POST variable, but I noticed that it was inserting values into the string.

Then I recoded both parts to use base 64 to send the data because I assumed that the data was being mangled in transmission. The string was being transferred just fine with no mistakes, but then when I converted it back from base 64 to base 16, it had the exact same mistakes as before!

Here is a comparison of the two strings (hex dumps). There are two meaningful chunks of data in each string, and it seems PHP is making mistakes only near those areas. The first line is how PHP is interpreting the string, and the second line is how I'm sending it from JavaScript.

c2 b0 c2 a7 c3 9a 7a 00 00 00 c2 bb 30 4d 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0 a7 da 7a 00 00 00 00 00 00 bb 30 4d 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Why are these phantom bytes appearing?

EDIT: Here's the code I'm using.

JavaScript:

function sendRequest(body)
{
    var url = "../update/index.php";
    $.post(url,{msg:body},function(data,status,jqx)
    {
        $("#response").html(data);
    });
}

body is a string whose beginning is hard-coded as "\u00b0\u00a7\u00da\u007a".

Then in PHP:

$msg =$_POST['msg'];
plain_hex_dump($msg);

plain_hex_dump simply outputs the string as hex, resulting in the first of the two hex dumps above.

Upvotes: 2

Views: 151

Answers (1)

Andrew Leach
Andrew Leach

Reputation: 12973

This is UTF-8 encoding.

b0 is encoded as c2 b0. a7 becomes c2 a7. da becomes c3 9a. 7a is not changed.

Thus your b0 a7 da 7a is represented in UTF-8 as c2 b0 c2 a7 c3 9a 7a.

It appears that Javascript is UTF-8-encoding your body variable. You could try using the utf8_decode function in PHP.

Upvotes: 3

Related Questions