trollington2
trollington2

Reputation: 13

What character set do hash functions (eg. MD5, SHA) use...

...I'd like to ask how is it, and hopefully not asking an alrady asked question (I searched for it on the internet for a week, didn't find it... I know it's hidden somewhere in those long books... so if it's here please direct me to the thread, ty)...

...I'd like to know what character base... or how exactly is it with character base which is used in hash functions like MD5, SHA... for example... what do they use base64, ascii, extended ascii... cose if I try to encode for example char alt 444 (╝) it encodes it... but there has to be a limit right, to how many indifferent chars can be used... cose otherwise the hash could not be unique, right? And many sites using these algorithms only let you use base64 character pool... I guess... please help... thank you

Upvotes: 1

Views: 4519

Answers (1)

martinstoeckli
martinstoeckli

Reputation: 24131

Hash algorithms are taking a bunch of bytes, calculate the hash and return a predefined number of bytes. So it doesn't matter to them, if the input is a short encoded text or a large binary file.

It depends on the implementation of the hash function and of the programming environment, how the string is converted to a byte array. As long as it is always done the same way, the hashes will be comparable. If you need a cross-platform hash, it is a good idea to first convert the string to a byte array (preferably UTF-8 encoded) and then feeding it to the hash.

The output is often a hexadecimal representation of the hash, but sometimes you can also request the binary output.

Example with SHA-256

SHA256("hello") = "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"
  • Returns 32 bytes which is the same as 256 bits, thus the name SHA-256.
  • The 32 bytes are hex encoded, every byte is represented as hexadecimal number with two characters (2c stands for one byte with the number 44).

Upvotes: 1

Related Questions