user2384994
user2384994

Reputation: 1847

conversion error between JSON for Chinese characters

I have a Chinese string "普派" that I want to transmit from client to web server using HTTP POST request. At the client side, I use the following jquery code:

$.ajax({
    url: 'http://127.0.0.1:8000/detect/word',
    type: 'POST',
    data: JSON.stringify('普派'),
    success: function(msg) {
        alert(msg);
    }
});

At the server side, I use python 3.3:

class DictRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        post_data = self.rfile.read(int(self.headers['Content-Length']))
        post_var = json.loads(post_data.decode())

But the result (post_var) is messy. The variable post_data of type bytes is: b'"\xc3\xa6\xe2\x84\xa2\xc2\xae\xc3\xa6\xc2\xb4\xc2\xbe"', but to correctly convert, it should be b'"\u666e\u6d3e"' (obtained by json.dumps("普派").encode()). Could you please help me solve this problem? Thank you very much.

Upvotes: 1

Views: 6970

Answers (1)

mata
mata

Reputation: 69012

The result of JSON.stringify('普派') depends on the encoding of your source file. Remember, what's really between that quotes is just a bunch of bytes, it's just your editor (or browser) that displays it as '普派'.
If the browser correctly detects your source encoding, then it shouldn't relly matter, but if it doesn't, then you'll end up with garbage.
So make sure to supply the correct file encoding (which should preferably be utf-8).

To be independent of such browser dependent interpretations try changing it to JSON.stringify("\u666e\u6d3e").

The json standard doesn't mandate that unicode characters must be replaced with teir unicode escape sequence on encoding. It just defines that the encoding should be unicode, and allows for 'any unicode character' within json strings, so the result of JSON.stringify isn't wrong if it encodes the given charactes as utf-8.
Either one should be fine, so what you should see on your server side should probably be b'"\xe6\x99\xae\xe6\xb4\xbe"'.

Upvotes: 2

Related Questions