Ricky
Ricky

Reputation: 10505

Replicating Java password hashing code in Python (PBKDF2WithHmacSHA1)

I have been trying to replicate the java password authenticate to python, however the resulted hash is different.

password: abcd1234

password token (java): $31$16$sWy1dDEx52vwQUCswXDYMQMzTJC39g1_nmrK384T4-w

generated password token (python): $pbkdf2$16$c1d5MWRERXg1MnZ3UVVDcw$qPQvE4QbrnYJTmRXk0M7wlfhH5U

From the Java code, the Iteration is 16, SALT should the first 16 char in sWy1dDEx52vwQUCswXDYMQMzTJC39g1_nmrK384T4-w, which is sWy1dDEx52vwQUCs and the hash should be wXDYMQMzTJC39g1_nmrK384T4-w

however, applying the variables to python gave me a different hash result which, qPQvE4QbrnYJTmRXk0M7wlfhH5U which is different from Java's hash.

Where did i missed out?

Java:

private static final String ALGORITHM = "PBKDF2WithHmacSHA1";
private static final int SIZE = 128;
private static final Pattern layout = Pattern.compile("\\$31\\$(\\d\\d?)\\$(.{43})");

public boolean authenticate(char[] password, String token)
   {
       Matcher m = layout.matcher(token);
       if (!m.matches())
           throw new IllegalArgumentException("Invalid token format");
       int iterations = iterations(Integer.parseInt(m.group(1)));
       byte[] hash = Base64.getUrlDecoder().decode(m.group(2));
       byte[] salt = Arrays.copyOfRange(hash, 0, SIZE / 8);
       byte[] check = pbkdf2(password, salt, iterations);
       int zero = 0;
       for (int idx = 0; idx < check.length; ++idx)
           zero |= hash[salt.length + idx] ^ check[idx];
       return zero == 0;
   }

Python:

from passlib.hash import pbkdf2_sha1

def hasher(password):
   size = 128

   key0 = "abcd1234"
   iter = int(password.split("$")[2])
   salt0 = password.split("$")[3][0: 16]

   hash = pbkdf2_sha1.using(rounds=iter, salt = salt0.encode()).hash(key0)
   print(hash.split('$')[4])

   return hash

Original Link for Java code: How can I hash a password in Java?

Upvotes: 2

Views: 1037

Answers (1)

Eli Collins
Eli Collins

Reputation: 8533

There's a bunch of things different between how that java code does things, and how passlib's pbkdf2_sha1 hasher does things.

  • The java hash string contains a log cost parameter, which needs passing through 1<<cost to get the number of rounds / iterations.

  • The salt+digest needs to be base64 decoded, then take the first 16 bytes as the salt (which actually corresponds to first 21 1/3 characters of base64 data).

  • Similarly, since the digest's bits start in the middle of a base64 character, when the salt+digest is decoded, and digest is then encoded separately, the base64 string would be AzNMkLf2DX-easrfzhPj7A (noticably different from the original encoded string).

Based on that, the following bit of code converts a java hash into the format used by pbkdf1_sha1.verify:

from passlib.utils.binary import b64s_decode, ab64_encode

def adapt_java_hash(jhash):
    _, ident, cost, data = jhash.split("$")
    assert ident == "31"
    data = b64s_decode(data.replace("_", ".").replace("-", "+"))
    return "$pbkdf2$%d$%s$%s" % (1<<int(cost), ab64_encode(data[:16]),
                                 ab64_encode(data[16:]))

>>> adapt_java_hash("$31$16$sWy1dDEx52vwQUCswXDYMQMzTJC39g1_nmrK384T4-w")
'$pbkdf2$65536$sWy1dDEx52vwQUCswXDYMQ$AzNMkLf2DX.easrfzhPj7A'

The resulting string should be suitable for passing into pbkdf2_sha1.verify("abcd1234", hash); except for one issue: The java code truncates the sha1 digest to 16 bytes, rather than the full 20 bytes; and way passlib's hasher is coded, the digest must be the full 20 bytes.

If you alter the java code to use SIZE=160 instead of SIZE=128, running the hash through the above adapt() function should then work in passlib.

Upvotes: 3

Related Questions