user3652621
user3652621

Reputation: 3634

Use Sha vs md5 or Hash in Snowflake-db

Let me preface by saying that I am not using this for storing passwords or any other sensitive info -- I simply want a row-level sha/hash that I can use later or to quickly check for unique records. My tables will be on the long side, in the range of 0.1 - 10 trillion rows.

I am using a Snowflake datawarehouse, and thus my options are SHA1, SHA2, MD5 (each with binary options), and HASH.

I guess I would like to minimize the chance of collisions (given the long tables) while not burning my compute credits needlessly.

Which one is the best option given my use case?

Upvotes: 3

Views: 4141

Answers (1)

Marcin Zukowski
Marcin Zukowski

Reputation: 4729

The built-in hash function should be good enough if you are ok accepting some conflicts. It can be quite much faster than MD5/SHA functions, and it produces good hashes considering it output, but it produces a smaller range of hashes (64-bit output) and as such is more likely to cause more conflicts.

If you require no conflicts, no hash function will guarantee that, obviously.

MD5/SHA functions are mostly useful when you want to compute a hash of a string in a form compatible with other systems computing a hash using one of these algorithms.

Upvotes: 3

Related Questions