Robert Christian
Robert Christian

Reputation: 18310

How to iteratively sha256 in Python using native lib (ie hashlib), using byte[] as input and not hex string

Background: I have an iterative hash algorithm I need to compute from a Python script and a Java web application.

Psuedo code:

hash = sha256(raw)
for x=1 to 64000 hash = sha256(hash)

where hash is a byte array of length 32, and not a hex string of length 64.

The reason I want to keep it in bytes is because, though Python can convert to hex string in between each iteration and keep the processing time under a second, Java takes 3 seconds for the String overhead.

So, the Java code looks like this:

// hash one time...
byte[] result = sha256(raw.getBytes("UTF-8"));

// then hash 64k-1 more times
for (int x = 0; x < 64000-1; x++) {
  result = sha256(result);
}

// hex encode and print result
StringBuilder sb = new StringBuilder();
Formatter formatter = new Formatter(sb);
for (int i=0; i<buf.length; i++) {
  formatter.format("%02x", buf[i]);
}
System.out.println(sb.toString());

And the Python code looks like this:

import hashlib

# hash 1 time...
hasher = hashlib.sha256()
hasher.update(raw)
digest = hasher.digest()

# then hash 64k-1 times
for x in range (0, 64000-1):
  # expect digest is bytes and not hex string
  hasher.update(digest) 
  digest = hasher.digest()
print digest.encode("hex")

The Python result calculated the hash on the hex representation of the first digest (String), rather than the raw digest bytes. So, I get varying outputs.

Upvotes: 4

Views: 4487

Answers (1)

Michał Zieliński
Michał Zieliński

Reputation: 1345

Method .update of hasher appends argument to previous text (Python docs). Instead you should create new hasher each time you want to compute digest.

import hashlib

# hash 1 time...
digest = hashlib.sha256(raw).digest()

# then hash 64k-1 times
for x in range(0, 64000-1):
  digest = hashlib.sha256(digest).digest()
print digest.encode("hex")

Upvotes: 6

Related Questions