user3776623
user3776623

Reputation: 25

HASH function with 256-bit in Snowflake

I am creating a masking policy in Snowflake (Enterprise version) to mask non varchar characters like Numeric/Integer/Float etc. I want to use hash function with returns 256-bit value.

I tried using HASH(<field_name>) which works fine but returns a 64-bit value. Is there any other hash function which works on number data type and returns 256-bit value. I am also fine if that needs to do type conversion.

Tried with below option but no luck.

CREATE OR REPLACE masking policy Stage.ID_Policy as (ID string) returns string ->
CASE
    WHEN current_role() IN ('ACCOUNTADMIN') THEN ID
    --ELSE HASH(ID::NUMERIC(38,0))
    --ELSE HASH_AGG(ID::NUMERIC(38,0))
    --ELSE SHA2_HEX(ID)
    --ELSE CAST(sha2(TO_VARCHAR(ID),256) as NUMERIC)  
    ELSE MD5(ID)
END;

Upvotes: 0

Views: 3856

Answers (2)

Gokhan Atil
Gokhan Atil

Reputation: 10109

MD5 returns 32-characters hex-encoded string. 256 bits makes 32 bytes, so it should be Ok for you if your aim is to get a 256-bit value.

If you need a function that uses a 256-bit digest to hash the data, then you can use SHA2. It will return 64 characters because 4-bits is enough to represent a hexadecimal char.

SHA2 ve MD5 accepts string or binary data, so you don't need to change your string to number as it will be converted to string again.

To return number:

CREATE OR REPLACE masking policy Stage.ID_Policy as (ID NUMERIC) returns NUMERIC -> 
  CASE WHEN current_role() IN ('ACCOUNTADMIN') THEN ID 
  ELSE TO_NUMBER(MD5(ID), 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')  END;

Upvotes: 1

patrick_at_snowflake
patrick_at_snowflake

Reputation: 453

I may be wrong but the way I am reading your code is that you're inputing strings, converting them to numeric and then applying totally different hash functions on the input before outputting as string.

Each hash function has a particular use case and variance in terms of output, and code.

hash_agg requires aggregation, i.e. a group by

Anyway back to your query, if you look up the Wiki for SHA2 the output length would be consistant but I think the 256 bits is referring to the number of bits used, not characters / word size. https://en.wikipedia.org/wiki/SHA-2

Added tip, try to stick to explicit conversions in data types, because the output cast to numeric in some cases it forces Snowflake to detect the different and apply a cast implicitly -- -that will be slower

Upvotes: 1

Related Questions