tensor
tensor

Reputation: 3340

fastest way to python dict to json binary string

I need to convert this python dict into binary json

   d = {'1': 'myval', '2': 'myval2'}

   json_binary_str = b'{"1": "myval", "2": "myval2"}'

in python 3, I have this :

   import ujson
   ujson.dumps(d)

but, this does not create binary string. How can I do this ?

Upvotes: 6

Views: 27529

Answers (3)

Amarjit Dhillon
Amarjit Dhillon

Reputation: 2816

If you need to convert the JSON to binary, you need to convert it to a string using dumps(), then you can convert it to binary as shown below

import json

if __name__ == '__main__':
    sent_data = {'1': 'myval', '2': 'myval2'}
    dumped_json_string = json.dumps(sent_data)
    binary_data = ' '.join(format(ord(letter), 'b') for letter in dumped_json_string)
    print binary_data

    jsn = ''.join(chr(int(x, 2)) for x in binary_data.split())
    received_data = json.loads(jsn)
    print received_data

the output of binary_data is

1111011 100010 110001 100010 111010 100000 100010 1101101 1111001 1110110 1100001 1101100 100010 101100 100000 100010 110010 100010 111010 100000 100010 1101101 1111001 1110110 1100001 1101100 110010 100010 1111101

the output of received_data is

{u'1': u'myval', u'2': u'myval2'}

Hope it helps!

Upvotes: 0

A.Shoman
A.Shoman

Reputation: 3085

I see this as a 2-step problem

Step 1: Convert json to string my_string = json.dumps(my_json)

Step 2: Convert string to binary string my_binary_string = my_string.encode('utf-8')

Or obviously in one line my_binary_string = json.dumps(my_json).encode('utf-8')

Upvotes: 0

Keeely
Keeely

Reputation: 1005

In the RFC https://www.rfc-editor.org/rfc/rfc7159, it says:

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32

At first glance it does seem that Python isn't really following the spec when you first look at this after all what does it mean to encode something when it remains a Python3 'str' string, however Python is doing some encoding for you nonetheless. Try this:

>>> json.dumps({"Japan":"日本"})
'{"Japan": "\\u65e5\\u672c"}'

You can see that the Japanese has got converted to unicode escapes, and the resultant string is actually ASCII, even if it's still a Python str. I'm unsure how to get json.dumps() to actually give you utf-8 sequences - for interoperability purposes - if you wanted them, however for all practical purposes this is good enough for most people. The characters are there and will be interpreted correctly. It's easy to get binary with:

>>> json.dumps({"Japan":"日本"}).encode("ascii")
b'{"Japan": "\\u65e5\\u672c"}'

And python does the right thing when loading back in:

>>> json.loads(json.dumps({"Japan":"日本"}).encode("ascii"))
{'Japan': '日本'}

But if you don't bother encoding at all, the loads() still figures out what to do as well when given a str:

>>> json.loads(json.dumps({"Japan":"日本"}))
{'Japan': '日本'}

Python is - as ever - trying to be as helpful as possible in figuring out what you want and doing it, but this is perplexing to people who dig a little deeper, and in spite of loving Python to bits I sympathise with the OP. Whether this kind of 'helpful' behaviour is worth the confusion is a debate that will rage on.

Worth noting that if the next thing to be done with the output is writing to a file, then you can just do:

pathlib.Path("myfile.json").open("w").write(json_data)

Then you don't need it binary because the file is opened in text mode and encoding is done for you.

Upvotes: 4

Related Questions