Ray Jhong
Ray Jhong

Reputation: 77

How do i parse raw data from Flask's request.get_data() to Chinese characters?

I'm using Flask to build a web server handling some Chinese requests via POST method. Originally, I'm thinking of using request.form['body'] to get the content, however, because of the client-side encoding is in BIG5, somehow returned values from Flask.request.form is always decoded using UTF-8, so i have to use request.get_data() to retrieve raw data from the request and decode it myself.

But the weird thing is that when the enctype = multipart/form-data everything is fine that i can use request.get_data().decode('big5') to get the correct characters, but when i don't specified enctype which will use application/x-www-form-urlencoded by default, the returned value like below:

Result 1.

%B6W%C3%D9%A4u%B5%7B%A6%B3%AD%AD%A4%BD%A5q

which is not 'BIG5' encoded, the original text should look like below:

Result 2.

超贊工程有限公司

'BIG5' encoded one should like below:

Result 3.

xb6W\xc3\xd9\xa4u\xb5{\xa6\xb3\xad\xad\xa4\xbd\xa5q

My question is how can i properly decode form data from Result1 to Result2 when using application/x-www-form-urlencoded?

Code and result if content-type eqauls to application/x-www-form-urlencoded as below: enter image description here

Code and result if content-type eqauls to multipart/form-data as below: enter image description here

Upvotes: 1

Views: 1710

Answers (1)

truth
truth

Reputation: 1186

You're getting an URL-encoded string. Use urllib to decode it:

import urllib
data = '%B6W%C3%D9%A4u%B5%7B%A6%B3%AD%AD%A4%BD%A5q'
print(urllib.parse.unquote(data, encoding='big5'))

This prints 超贊工程有限公司, which looks like your expected output.

Upvotes: 2

Related Questions