Reputation: 727
I have created a Sha1 function that works in most times the same way as PHP's sha1 function, and gives the same output. But when UTF-8 characters appear, they differ. For example, with the string "hj6¬", in PHP I get "7f9d591232c5fde9f757c4d8472921517991dc3c" while in my Java function I get "c963b7df20488e9ef50c1a309c1fa747ab5d8822". Here is the Java function:
https://github.com/Razican/Java-Utils/blob/master/src/razican/utils/StringUtils.java#L115
Which one is the correct one? How can I implement it in Java?
Upvotes: 0
Views: 889
Reputation: 108879
The correct output is 7f9d591232c5fde9f757c4d8472921517991dc3c. You are dropping a byte:
final MessageDigest md = MessageDigest.getInstance("SHA-1");
md.update(str.getBytes("UTF-8"), 0, str.length());
sha1hash = md.digest();
The above code assumes that the length of the UTF-16 string equals the length of the UTF-8 encoded byte array. If the the UTF-8 form is longer than the UTF-16 form the digest will be incorrect.
codepoint glyph escaped UTF-8 info
=======================================================================
U+0068 h \u0068 68, BASIC_LATIN, LOWERCASE_LETTER
U+006a j \u006a 6a, BASIC_LATIN, LOWERCASE_LETTER
U+0036 6 \u0036 36, BASIC_LATIN, DECIMAL_DIGIT_NUMBER
U+00ac ¬ \u00ac c2,ac, LATIN_1_SUPPLEMENT, MATH_SYMBOL
Using the length of the array:
byte[] utf8 = str.getBytes(StandardCharsets.UTF_8);
md.update(utf8, 0, utf8.length);
You could also use md.update(str.getBytes(StandardCharsets.UTF_8))
Upvotes: 1