Reputation: 3634
Let me preface by saying that I am not using this for storing passwords or any other sensitive info -- I simply want a row-level sha/hash that I can use later or to quickly check for unique records. My tables will be on the long side, in the range of 0.1 - 10 trillion rows.
I am using a Snowflake datawarehouse, and thus my options are SHA1, SHA2, MD5 (each with binary options), and HASH.
I guess I would like to minimize the chance of collisions (given the long tables) while not burning my compute credits needlessly.
Which one is the best option given my use case?
Upvotes: 3
Views: 4141
Reputation: 4729
The built-in hash
function should be good enough if you are ok accepting some conflicts. It can be quite much faster than MD5/SHA functions, and it produces good hashes considering it output, but it produces a smaller range of hashes (64-bit output) and as such is more likely to cause more conflicts.
If you require no conflicts, no hash function will guarantee that, obviously.
MD5
/SHA
functions are mostly useful when you want to compute a hash of a string in a form compatible with other systems computing a hash using one of these algorithms.
Upvotes: 3