Reputation: 399
To upload files to Google storage, I sent multipart POST request to the url returned by create_upload_url(redirect_path). The POST request includes files and a JSON string that I need to use in redirect_path.
If the JSON string has unicode character, then it will be broken after GAE forward the multipart POST request to me.
JSON String sent to upload url:
{"subject": "日文", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAycjkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6ZCQw"}
JSON string after GAE forward the request to redirect_path
{"subject": "=E6=97=A5=E6=96=87", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAyc=
jkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6Z=
CQw"}
The unicode becomes unreadable and the '=' and 'newline' are inserted unexpectedly.
Strangely, a shorter JSON string with unicode works fine.
{"subject": "日文", "tag_key": ""}.
Other points,
the problem only happens in production environment. I cannot reproduce in local development server.
Multipart POST request to redirect_path doesn't have the problem. The issue only occurs on posting to blobstore.create_upload_url.
I am using GAE standard environment, Python, Django, Django Rest framework, and Postman to test.
Please let me know if you think of any possible cause.
Upvotes: 1
Views: 99
Reputation: 55799
Your json string is being encoded with the quoted-printable encoding for transport because it contains non-ascii characters. The quopri module in Python's standard library provides the tools to handle this:
>>> import quopri
>>> foo = '{"subject": "日文", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAycjkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6ZCQw"}'
>>> encoded = quopri.encodestring(foo)
>>> print encoded
{"subject": "=E6=97=A5=E6=96=87", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAyc=
jkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6Z=
CQw"}
You can decode with quopri.decodestring
to get the original string:
>>> print quopri.decodestring(encoded)
{"subject": "日文", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAycjkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6ZCQw"}
The encoding is triggered by the presence of non-ascii characters in the json string; the newlines are being inserted because quoted-printable encoding enforces a line length of 76 characters.
You may be able to avoid this problem completely by escaping the the non-ascii characters in your json string - for example Python's json module does this by default:
If ensure_ascii is true (the default), all non-ASCII characters in the output are escaped with
\uXXXX
sequences, and the result is a str instance consisting of ASCII characters only.
>>> json.dumps({"subject": "日文", "tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAycjkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6ZCQw"})
{
"tag_key": "ahNifnN1aXF1aS1kZXYtMTcwMDAycjkLEgRUZWFtIgtzdWlxdWlfdGVzdAwLEgdUYWdUeXBlGICAgICAgIAKDAsSA1RhZxiAgICAgK6ZCQw",
"subject": "\u65e5\u6587"
}
Upvotes: 1