Reputation: 412
(there are a fair few fairly similar posts - I've read them, the solutions largely seem to be about Python string encoding, which I thought I had under control but clearly still don't)
I'm trying to port an NPM package to Python, but I can't get the same results from each script. I've stripped it down to this:
import sys
import hashlib
from binascii import hexlify
print("Python", sys.version)
test1 = "abcdefg".encode("utf-8")
print(hexlify(test1), hashlib.sha256(test1).hexdigest())
test2 = "abcdefg".encode("latin1")
print(hexlify(test2), hashlib.sha256(test2).hexdigest())
test3 = "abcdefg".encode("ascii")
print(hexlify(test3), hashlib.sha256(test3).hexdigest())
test4 = b"abcdefg"
print(hexlify(test4), hashlib.sha256(test4).hexdigest())
test5 = bytes([0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67])
print(hexlify(test5), hashlib.sha256(test5).hexdigest())
var js_sha3 = require('js-sha3')
var crypto = require('crypto');
var buffer = require('buffer');
console.log("Javascript:", process.version)
function toHex(str) {
return new Buffer.from(str).toString('hex')
}
var test1 = "abcdefg"
console.log(toHex(test1).toString('hex'), js_sha3.sha3_256(test1))
var test2 = Buffer.from("abcdefg")
console.log(toHex(test2), js_sha3.sha3_256(test2))
var test3 = "abcdefg"
console.log(toHex(test3), crypto.createHash('sha3-256').update(test3).digest("hex"))
var test4 = Buffer.from("abcdefg")
console.log(toHex(test4), crypto.createHash('sha3-256').update(test4).digest("hex"))
var test5 = buffer.Buffer.from("abcdefg", 'hex')
console.log(toHex(test5), js_sha3.sha3_256(test5))
However, the output looks like this:
Python 3.7.4 (default, Sep 7 2019, 18:27:02)
[Clang 10.0.1 (clang-1001.0.46.4)]
b'61626364656667' 7d1a54127b222502f5b79b5fb0803061152a44f92b37e23c6527baf665d4da9a
b'61626364656667' 7d1a54127b222502f5b79b5fb0803061152a44f92b37e23c6527baf665d4da9a
b'61626364656667' 7d1a54127b222502f5b79b5fb0803061152a44f92b37e23c6527baf665d4da9a
b'61626364656667' 7d1a54127b222502f5b79b5fb0803061152a44f92b37e23c6527baf665d4da9a
b'61626364656667' 7d1a54127b222502f5b79b5fb0803061152a44f92b37e23c6527baf665d4da9a
Javascript: v12.15.0
61626364656667 7d55114476dfc6a2fbeaa10e221a8d0f32fc8f2efb69a6e878f4633366917a62
61626364656667 7d55114476dfc6a2fbeaa10e221a8d0f32fc8f2efb69a6e878f4633366917a62
61626364656667 7d55114476dfc6a2fbeaa10e221a8d0f32fc8f2efb69a6e878f4633366917a62
61626364656667 7d55114476dfc6a2fbeaa10e221a8d0f32fc8f2efb69a6e878f4633366917a62
abcdef 8b8a2a6bc589cd378fc57f47d5668c58b31167b2bf9e632696e5c2d50fc16002
However, entering abcdefg
into https://emn178.github.io/online-tools/sha256.html *(which is backed by js-sha3
) returns 7d1a54...
.
So, my question is - how does my use of SHA-256 in Javascript and Python differ? What am I missing? (I'm not going to try and claim that one of the implementations is broken!)
[edit] If I use MD5 instead of SHA-256, the results match, adding further to the mystery!
x = bytes("thequickbrownfox", "utf-8")
print(hashlib.md5(x).hexdigest())
print(hashlib.sha256(x).hexdigest())
var x = "thequickbrownfox"
console.log(crypto.createHash('md5').update(x).digest("hex"))
console.log(crypto.createHash('sha3-256').update(x).digest("hex"))
outputs:
308fb76dc4d730360ee33932d2fb1056
bd484b82d7875e115c7273e9c6102ca4946b7c55fe905012be9152b87fe09568
308fb76dc4d730360ee33932d2fb1056
4822316e0d7a7a2ce1bb6489e57c73ca5db4c7c660c79c3c65839bd4aaf4ef10
Upvotes: 3
Views: 1176
Reputation: 412
Today I learned the very important difference between sha256
and sha3-256
.
Upvotes: 0
Reputation: 734
That is certainly correct that for the same given input the hash function produces the same digest. However, where it gets tricky is that many of those hash functions accept the input in terms of bytes. That means here the encoding the giving string into different is dependent on the platform. However, you also need to take into account that different programming languages might have subtle differences, for example I'm not familiar with python, but it might be adding an empty space at the end of the input and/or use different Unicode representations for different special characters. Even one byte change in the input would produce a completely different output, as one might expect from a hash function.
To sum it up, should you want to find why a different hash is created, you should perform a binary comparison of the input of the hash functions. Or print out the input for the SHA-256 in hexadecimals or base 64.
Upvotes: 2