Reputation: 25
I am creating a masking policy in Snowflake (Enterprise version) to mask non varchar characters like Numeric/Integer/Float etc. I want to use hash function with returns 256-bit value.
I tried using HASH(<field_name>) which works fine but returns a 64-bit value. Is there any other hash function which works on number data type and returns 256-bit value. I am also fine if that needs to do type conversion.
Tried with below option but no luck.
CREATE OR REPLACE masking policy Stage.ID_Policy as (ID string) returns string ->
CASE
WHEN current_role() IN ('ACCOUNTADMIN') THEN ID
--ELSE HASH(ID::NUMERIC(38,0))
--ELSE HASH_AGG(ID::NUMERIC(38,0))
--ELSE SHA2_HEX(ID)
--ELSE CAST(sha2(TO_VARCHAR(ID),256) as NUMERIC)
ELSE MD5(ID)
END;
Upvotes: 0
Views: 3856
Reputation: 10109
MD5 returns 32-characters hex-encoded string. 256 bits makes 32 bytes, so it should be Ok for you if your aim is to get a 256-bit value.
If you need a function that uses a 256-bit digest to hash the data, then you can use SHA2. It will return 64 characters because 4-bits is enough to represent a hexadecimal char.
SHA2 ve MD5 accepts string or binary data, so you don't need to change your string to number as it will be converted to string again.
To return number:
CREATE OR REPLACE masking policy Stage.ID_Policy as (ID NUMERIC) returns NUMERIC ->
CASE WHEN current_role() IN ('ACCOUNTADMIN') THEN ID
ELSE TO_NUMBER(MD5(ID), 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') END;
Upvotes: 1
Reputation: 453
I may be wrong but the way I am reading your code is that you're inputing strings, converting them to numeric and then applying totally different hash functions on the input before outputting as string.
Each hash function has a particular use case and variance in terms of output, and code.
hash_agg requires aggregation, i.e. a group by
Anyway back to your query, if you look up the Wiki for SHA2 the output length would be consistant but I think the 256 bits is referring to the number of bits used, not characters / word size. https://en.wikipedia.org/wiki/SHA-2
Added tip, try to stick to explicit conversions in data types, because the output cast to numeric in some cases it forces Snowflake to detect the different and apply a cast implicitly -- -that will be slower
Upvotes: 1