jhall-upsolver
jhall-upsolver

Reputation: 11

How do you mask parts of an IP address in a data transformation

I have a data source that includes IP addresses. I want to preserve the structure of the IP addresses in my transformation (do not want to use MD5 to hash them), but I would like to mask certain parts of the IP. For example if my source data includes 192.168.105.36, I may want the output to include 192.168.105.xxx. I might also want 192.168.xxx.xxx. The source IP addresses could have varying numbers of digits, for example 192.168.1.5 or 192.168.10.105, making it hard to parse the string by character counts. Any ideas?

Upvotes: 0

Views: 74

Answers (1)

jhall-upsolver
jhall-upsolver

Reputation: 11

There are several ways to do this, but once of the most flexible ways is to use the SPLIT function to break the IP address into an array, and then mask/replace each part before concatenating it back together. Here is an example transformation job that applies a slightly modified mask to two different ip addresses. The first example (masked_dstaddr) masks the last two parts of the IP (i.e. 192.168.xxx.xxx), and the second example (masked_srcaddr) only masks the last part. Check out the LET statement in the below job to see the use of the SPLIT function, along with ELEMENT_AT() to return each array part, and || to concatenate the IP address back together:

CREATE JOB mask_flow_logs
    START_FROM = BEGINNING  
    ADD_MISSING_COLUMNS = TRUE
    RUN_INTERVAL = 1 MINUTE
AS INSERT INTO default_glue_catalog.database_e809da.masked_vpc_logs MAP_COLUMNS_BY_NAME 
SELECT 
    new_destip as masked_dstaddr,
    new_srcip as masked_srcaddr
FROM default_glue_catalog.database_e809da.vpc_flow_logs_raw
    LET destip_arr = SPLIT(dstaddr,'.'),
        new_destip = ELEMENT_AT(destip_arr[],1) || '.' || ELEMENT_AT(destip_arr[],2) || '.' || 'yyy' || '.xxx',
        sourceip_arr = SPLIT(srcaddr,'.'),
        new_srcip = ELEMENT_AT(sourceip_arr[],1) || '.' || ELEMENT_AT(sourceip_arr[],2) || '.' || ELEMENT_AT(sourceip_arr[],3) || '.xxx'
WHERE $commit_time BETWEEN RUN_START_TIME() AND RUN_END_TIME();

Here is the resulting data transformation showing the different masks:

enter image description here

Upvotes: 0

Related Questions