Mario Corchero
Mario Corchero

Reputation: 5575

Using json.dumps with ensure_ascii=True

When using json.dumps the default for ensure_ascii is True but I see myself continuously setting it to False as:

In which scenarios would you want it to be True? What is the usecase for that option?

From the Docs:

If ensure_ascii is true (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only.

What is the benefit of it?

Upvotes: 27

Views: 48493

Answers (1)

Mario Corchero
Mario Corchero

Reputation: 5575

Writing up thanks to @user2357112

First thing is to understand there is no binary representation in JSON. Therefore all strings should be valid unicode points. If you are trying to json.dumps raw bytes you might be doing something wrong.

Then check:

Which makes me assume that:

  • When you are encoding text into json and all your strings are in unicode it is fine to use ensure_ascii=False, but it might actually make more sense to leave it to true and decode the str. (As per specification dumps doesnt guarantee unicode back, though it does return it if you pass unicode.
  • If you are working with str objects, calling ensure_ascii=False will prevent json from transforming your chars to unicode. You might think you want that but if you then try to read those in the browser for example weird things might happen

About how ensure_ascii impacts the result, this is a table that might help.

+-----------------------+--------------+------------------------------+
|         Input         | Ensure_ascii |            output            |
+-----------------------+--------------+------------------------------+
| u”汉语”                | True         | '"\\u6c49\\u8bed"'           |
| u”汉语”                | False        | u'"\u6c49\u8bed"'            |
| u”汉语".encode("utf-8")| True         | '"\\u6c49\\u8bed"’           |
| u”汉语".encode("utf-8")| False        | '"\xe6\xb1\x89\xe8\xaf\xad"' |
+-----------------------+--------------+------------------------------+

Note the last value is utf-8 encoded unicode into bytes. Which might be not parseable by other json decoders.

Moreover If you mix types(Array of unicode and str) and use ensure_ascii=False you can get an UnicodeDecodeErrror (When encoding into json, mindblending) as the module will to return you a unicode object but it wont be able to convert the str into unicode using the default encoding (ascii)

Upvotes: 27

Related Questions