Reputation: 3340
I need to convert this python dict into binary json
d = {'1': 'myval', '2': 'myval2'}
json_binary_str = b'{"1": "myval", "2": "myval2"}'
in python 3, I have this :
import ujson
ujson.dumps(d)
but, this does not create binary string. How can I do this ?
Upvotes: 6
Views: 27529
Reputation: 2816
If you need to convert the JSON to binary, you need to convert it to a string using dumps()
, then you can convert it to binary as shown below
import json
if __name__ == '__main__':
sent_data = {'1': 'myval', '2': 'myval2'}
dumped_json_string = json.dumps(sent_data)
binary_data = ' '.join(format(ord(letter), 'b') for letter in dumped_json_string)
print binary_data
jsn = ''.join(chr(int(x, 2)) for x in binary_data.split())
received_data = json.loads(jsn)
print received_data
the output of binary_data is
1111011 100010 110001 100010 111010 100000 100010 1101101 1111001 1110110 1100001 1101100 100010 101100 100000 100010 110010 100010 111010 100000 100010 1101101 1111001 1110110 1100001 1101100 110010 100010 1111101
the output of received_data is
{u'1': u'myval', u'2': u'myval2'}
Hope it helps!
Upvotes: 0
Reputation: 3085
I see this as a 2-step problem
Step 1: Convert json to string
my_string = json.dumps(my_json)
Step 2: Convert string to binary string
my_binary_string = my_string.encode('utf-8')
Or obviously in one line
my_binary_string = json.dumps(my_json).encode('utf-8')
Upvotes: 0
Reputation: 1005
In the RFC https://www.rfc-editor.org/rfc/rfc7159, it says:
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32
At first glance it does seem that Python isn't really following the spec when you first look at this after all what does it mean to encode something when it remains a Python3 'str' string, however Python is doing some encoding for you nonetheless. Try this:
>>> json.dumps({"Japan":"日本"})
'{"Japan": "\\u65e5\\u672c"}'
You can see that the Japanese has got converted to unicode escapes, and the resultant string is actually ASCII, even if it's still a Python str. I'm unsure how to get json.dumps() to actually give you utf-8 sequences - for interoperability purposes - if you wanted them, however for all practical purposes this is good enough for most people. The characters are there and will be interpreted correctly. It's easy to get binary with:
>>> json.dumps({"Japan":"日本"}).encode("ascii")
b'{"Japan": "\\u65e5\\u672c"}'
And python does the right thing when loading back in:
>>> json.loads(json.dumps({"Japan":"日本"}).encode("ascii"))
{'Japan': '日本'}
But if you don't bother encoding at all, the loads() still figures out what to do as well when given a str:
>>> json.loads(json.dumps({"Japan":"日本"}))
{'Japan': '日本'}
Python is - as ever - trying to be as helpful as possible in figuring out what you want and doing it, but this is perplexing to people who dig a little deeper, and in spite of loving Python to bits I sympathise with the OP. Whether this kind of 'helpful' behaviour is worth the confusion is a debate that will rage on.
Worth noting that if the next thing to be done with the output is writing to a file, then you can just do:
pathlib.Path("myfile.json").open("w").write(json_data)
Then you don't need it binary because the file is opened in text mode and encoding is done for you.
Upvotes: 4