Reputation: 2111
What's the best way to consistently hash an object/dictionary that's limited to what JSON can represent, in both JavaScript and Python? What about in many different languages?
Of course there are hash functions implemented consistently in many different languages that take a string, but to hash an object you have to convert it to a string representation first.
I want a hash function that will always return the same value for the same dictionary in any language, but the JSON spec doesn't guarantee anything about the order of keys in the serialized representation.
Do json.dumps()
and JSON.stringify()
behave identically? How would you verify this?
If not, is there a serialization format with libraries in many languages (I'm practically interested in Python and JavaScript but also curious about all languages) that doesn't require any additional processing by the caller to produce consistent results?
Upvotes: 3
Views: 5319
Reputation: 864
I thought I might attempt a practical example.
In javascript I did:
import stringify from 'json-stable-stringify'
import sha256 from 'simple-sha256'
hash_str = sha256(stringify({'hello':'goodbye', '123': 456}))
// hash_str = 72804f4e0847a477ee69eae4fbf404b03a6c220bacf8d5df34c964985acd473f
json-stable-stringify
guarantees a sorted json. sha256
allows for nodejs / browser compatibility.
In python 3.8 I did:
import hashlib
import json
hash_str = hashlib.sha256(json.dumps({'hello':'goodbye', '123': 456}, sort_keys=True, separators=(',', ':')).encode("utf-8")).hexdigest()
# hash_str = 72804f4e0847a477ee69eae4fbf404b03a6c220bacf8d5df34c964985acd473f
I haven't yet done extensive testing but with the json objects I've tried, it has successfully matched.
Upvotes: 0
Reputation: 79521
I would split this into two problems.
Use (1) to get two strings, then UTF8 encode, then use (2) to get hashes.
Since (2) is straightforward, I'll only address (1).
There are multiple facets to the problem of making sure the two JSON strings you generate are identical.
1
on one side and 1.0
on the other. (This probably won't be as big of an issue however.)"
and \
be backslash-escaped in JSON strings. Most serializers do more than necessary, however, and reduce almost all Unicode characters to the \uXXXX
equivalent. See json.org for the details on JSON string encoding. One way to remove all ambiguity is to only escape when absolutely necessary.You'll want to make sure all of these are matched between JavaScript and Python. Most JSON serialization libraries I've used provide configuration hooks for all of the things I mention in the list above. Unfortunately, I'm not very familiar with the JavaScript or Python libraries.
Upvotes: 4
Reputation: 51847
JSON is a well-defined language for representing the state of objects. The functions do not behave identically, but they do behave equivalently.
For instance:
json.dumps({'hello':'goodbye', 123: 456})
May produce either:
{"hello":"goodbye", "123": 456}
or
{"123": 456, "hello":"goodbye"}
And if you pass in the indent
parameter then you get even more possibilities for different results.
Most languages if they do not already have a built-in way to handle JSON (e.g. Python & JS) then they'll have a 3rd party utility that is perfectly sufficient (see Newtonsoft JSON library for .NET)
Each language that I'm aware of will produce valid JSON, which means that it can be parsed by each other language that provides a JSON parser.
Upvotes: 0