titan31
titan31

Reputation: 195

Snowflake Data Masking

We're looking to mask certain PII in our Snowflake environment where it relates to team members, and at the moment our masking is set up to mask every row in the column we define in our masking policies.

What we'd like to get to though is only masking only rows contain a membership number in a separate table. Is that possible to implement or how would I go about doing it?

member name
A acds
B asdas
C asdeqw
member
B

Just as an example, in the above tables, we'd only want to mask member B. At the moment, all 3 rows in the first table would be masked.

We've a possible workaround in doing this in logic of an extra view but that's actually altering the data, whereas our hope was we could use the Dynamic Data Masking and then have exception processes for it.

Upvotes: 0

Views: 1797

Answers (3)

Pravallika Chunduru
Pravallika Chunduru

Reputation: 1

As I understand, you want to mask one column in your table based on another column and also lookup. We can use conditional masking in this case - https://docs.snowflake.com/en/sql-reference/sql/create-masking-policy.html#conditional-masking-policy

create or replace masking policy name_mask as (val string, member_id string) returns string ->
  case
  when exists
    (
        select 1
        from member_to_be_masked m
        where m.member_id = member_id
    )
    then '********'
  else val
end;

In query profile, it would come as a secure function. Please evaluate the peformance. Based on the total records for which this function has to be applied, peformance difference may be significant

Upvotes: 0

Eric Lin
Eric Lin

Reputation: 1510

Keeping my previous answer in case it can still be useful.

Another workaround I can think of is to use variant data and then create a view on top of it.

  1. prepare the data in JSON format:
create or replace table member_json (member_id varchar, data variant);
insert into member_json
select 
'A', parse_json('{"member_id": "A", "name" : "member_a"}')
union
select 
'B', parse_json('{"member_id": "B", "name" : "member_b"}')
union
select 
'C', parse_json('{"member_id": "C", "name" : "member_c"}')
;

create or replace table member_to_be_masked(member_id varchar);
insert into member_to_be_masked values ('B');

Data looks like below:

select * from member_json;

+-----------+----------------------+
| MEMBER_ID | DATA                 |
|-----------+----------------------|
| A         | {                    |
|           |   "member_id": "A",  |
|           |   "name": "member_a" |
|           | }                    |
| B         | {                    |
|           |   "member_id": "B",  |
|           |   "name": "member_b" |
|           | }                    |
| C         | {                    |
|           |   "member_id": "C",  |
|           |   "name": "member_c" |
|           | }                    |
+-----------+----------------------+

select * from member_to_be_masked;

+-----------+
| MEMBER_ID |
|-----------|
| B         |
+-----------+
  1. create a JS UDF:
create or replace function json_mask(mask boolean, v variant)
returns variant
language javascript
as
$$
    if (MASK) {
        V["member_id"] = '******'
        V["name"] = '******';
    }
    return V;
$$;
  1. create a masking policy using the UDF:
create or replace masking policy member_mask 
as (val variant) 
returns variant ->
  case
  when exists
    (
        select 
            member_id 
        from 
            member_to_be_masked 
        where member_id = val['member_id']
    ) 
    then json_mask(true, val)
  else val
end;
  1. apply the policy to the member_json table:
alter table if exists member_json 
modify column data 
set masking policy member_mask;
  1. query the table will see member B masked:
select * from member_json;

+-----------+--------------------------+
| MEMBER_ID | DATA                     |
|-----------+--------------------------|
| A         | {                        |
|           |   "member_id": "A",      |
|           |   "name": "member_a"     |
|           | }                        |
| B         | {                        |
|           |   "member_id": "******", |
|           |   "name": "******"       |
|           | }                        |
| C         | {                        |
|           |   "member_id": "C",      |
|           |   "name": "member_c"     |
|           | }                        |
+-----------+--------------------------+
  1. create a view on top of it:
create or replace view member_view
as
select 
    data:"member_id" as member_id,
    data:"name" as name
from member_json;
  1. query the view will see masked data as well:
select * from member_view;
+-----------+------------+
| MEMBER_ID | NAME       |
|-----------+------------|
| "A"       | "member_a" |
| "******"  | "******"   |
| "C"       | "member_c" |
+-----------+------------+

Not sure if this can help in your case use.

Upvotes: 0

Eric Lin
Eric Lin

Reputation: 1510

Prepare the data:

create or replace table member (member_id varchar, name varchar);
insert into member values ('A', 'member_a'),('B', 'member_b'),('C', 'member_c');

create or replace table member_to_be_masked(member_id varchar);
insert into member_to_be_masked values ('B');

If you want to mask the member column:

create or replace masking policy member_mask as (val string) returns string ->
  case
  when exists
    (
        select 
            member_id 
        from 
            member_to_be_masked 
        where member_id = val
    ) 
    then '********'
  else val
end;

alter table if exists member 
modify column member_id 
set masking policy member_mask;

select * from member;

+-----------+----------+
| MEMBER_ID | NAME     |
|-----------+----------|
| A         | member_a |
| ********  | member_b |
| C         | member_c |
+-----------+----------+

However, if you want to mask the name column, I don't see an easy way. I have tried the policy to link back to the table itself to find out if the member_id for the current column name value, but it fails with the below error message:

Policy body contains a UDF or Select statement that refers to a Table attached to another Policy.

It looks like that in the policy, we can't reference back to the source table. And because the policy will only get the value of the defined column value, it has no knowledge of other column values, so we can't make decision on whether to apply the mask or not.

If can work if you also store the "name" into the mapping table, together with the member_id, like the below:

create or replace table member (member_id varchar, name varchar);
insert into member values ('A', 'member_a'),('B', 'member_b'),('C', 'member_c');

create or replace table member_to_be_masked(member_id varchar, name varchar);
insert into member_to_be_masked values ('B', 'member_b');

create or replace masking policy member_mask as (val string) returns string ->
  case
  when exists
    (
        select member_id 
        from member_to_be_masked 
        where name = val
    )
    then '********'
  else val
end;

alter table if exists member 
modify column name 
set masking policy member_mask;

select * from member;
+-----------+----------+
| MEMBER_ID | NAME     |
|-----------+----------|
| A         | member_a |
| B         | ******** |
| C         | member_c |
+-----------+----------+

The downside of this approach is that if different members with the same name, all members with this name will be masked, regardless if the member's id is in the mapping table or not.

Upvotes: 1

Related Questions