Aditya
Aditya

Reputation: 133

Are hashcodes good for joins

I am new to Snowflake and want to know, can we use hashcodes for joining tables or finding unique records or deleting duplicate records in Snowflake(or in any other database in general)? I am designing an ETL flow, what are the advantages or disadvantages of using hashcodes and why are they generally not used often in most Data warehousing designs?

Upvotes: 0

Views: 815

Answers (1)

patrick_at_snowflake
patrick_at_snowflake

Reputation: 453

If you mean hashing with something like md5_binary or sha1_binary then yes absolutely,

Binary values are half the byte length of the equivalent varchar length and so you should use that. The benefit of using hash-keys (effectively) is that you only need a single join column if for instance the natural keys of a table might be a composite key. Now you could instead a numeric/int data type, sequence key but that imposes a load order. Example only after the related dimension tables have loaded should you build the related fact table --- if you are doing that.

Data Vault prefers durable hash-keys because it does not impose any load ordering, load in any order independently.

Anyway I digress, yes hash-keys have great advantages, just make sure they're binary data types when loaded.

Upvotes: 1

Related Questions