GAE: Unicode JSON String Not Working in Multipart Post After Blob Upload

Question

To upload files to Google storage, I sent multipart POST request to the url returned by create_upload_url(redirect_path). The POST request includes files and a JSON string that I need to use in redirect_path.

If the JSON string has unicode character, then it will be broken after GAE forward the multipart POST request to me.

JSON String sent to upload url:

{"subject": "日文", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAycjkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6ZCQw"}

JSON string after GAE forward the request to redirect_path

{"subject": "=E6=97=A5=E6=96=87", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAyc=
jkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6Z=
CQw"}

The unicode becomes unreadable and the '=' and 'newline' are inserted unexpectedly.

Strangely, a shorter JSON string with unicode works fine.

{"subject": "日文", "tag_key": ""}.

Other points,

the problem only happens in production environment. I cannot reproduce in local development server.
Multipart POST request to redirect_path doesn't have the problem. The issue only occurs on posting to blobstore.create_upload_url.
I am using GAE standard environment, Python, Django, Django Rest framework, and Postman to test.

Please let me know if you think of any possible cause.

snakecharmerb · Accepted Answer

Your json string is being encoded with the quoted-printable encoding for transport because it contains non-ascii characters. The quopri module in Python's standard library provides the tools to handle this:

>>> import quopri
>>> foo = '{"subject": "日文", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAycjkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6ZCQw"}'
>>> encoded = quopri.encodestring(foo)
>>> print encoded
{"subject": "=E6=97=A5=E6=96=87", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAyc=
jkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6Z=
CQw"}

You can decode with quopri.decodestring to get the original string:

>>> print quopri.decodestring(encoded)
{"subject": "日文", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAycjkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6ZCQw"}

The encoding is triggered by the presence of non-ascii characters in the json string; the newlines are being inserted because quoted-printable encoding enforces a line length of 76 characters.

You may be able to avoid this problem completely by escaping the the non-ascii characters in your json string - for example Python's json module does this by default:

If ensure_ascii is true (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the result is a str instance consisting of ASCII characters only.

>>> json.dumps({"subject": "日文", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAycjkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6ZCQw"})    

{
    "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAycjkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6ZCQw",
    "subject": "\u65e5\u6587"
}

GAE: Unicode JSON String Not Working in Multipart Post After Blob Upload

Answers (1)

Related Questions