Roshan Fernando
Roshan Fernando

Reputation: 525

How to perform sql aggregation on Snowflake array and output multiple arrays?

I have a snowflake array as below rows which is an input, which I would want to check for each value in the array value and spit as multiple output arrays based on the value's length for values with 5 digits as one column, and values with 6 digits as another column.

ID_COL,ARRAY_COL_VALUE
1,[22,333,666666]
2,[1,55555,999999999]
3,[22,444]

Output table:

ID_COL,FIVE_DIGIT_COL,SIX_DIGIT_COL
1,[],[666666]
2,[555555],[]
3,[],[]

Please let me know if we could iterate through each array value and perform SQL aggregation to check column length and then output as a separate column outputs. Creating it through SQL would be great, but UDFs using javascript, python if an option would also be great.

Upvotes: 0

Views: 1036

Answers (2)

Rajat
Rajat

Reputation: 5803

If you're dealing strictly with numbers here's another way

with cte (id, array_col) as
(select 1,[22,333,666666,666666] union all
 select 2,[1,22222,55555,999999999] union all
 select 3,[22,444])

select *, 
       concat(',',array_to_string(array_col,',,'),',') as str_col,
       regexp_substr_all(str_col,',([^,]{5}),',1,1,'e') as len_5,
       regexp_substr_all(str_col,',([^,]{6}),',1,1,'e') as len_6
from cte;

The basic idea is to turn that array into a string and keep all the digits surrounded by , so that we can parse the pattern using regex_substr_all.

If you're dealing with strings, you can modify it to use a delimiter that won't show up in your data.

Upvotes: 0

Lukasz Szozda
Lukasz Szozda

Reputation: 175726

Using SQL and FLATTEN:

CREATE OR REPLACE TABLE t(ID_COL INT,ARRAY_COL_VALUE VARIANT)
AS
SELECT 1,[22,333,666666] UNION ALL
SELECT 2,[1,55555,999999999] UNION ALL
SELECT 3,[22,444];

Query:

SELECT ID_COL, 
  ARRAY_AGG(CASE WHEN s.value BETWEEN 10000 AND 99999 THEN s.value END) AS FIVE_DIGIT_COL,
  ARRAY_AGG(CASE WHEN s.value BETWEEN 100000 AND 999999 THEN s.value END) AS SIX_DIGIT_COL
FROM t, TABLE(FLATTEN(ARRAY_COL_VALUE)) AS s
GROUP BY ID_COL;

And Python UDF:

create or replace function filter_arr(arr variant, num_digits INT)
returns variant
language python
runtime_version = 3.8
handler = 'main'
as $$
def main(arr, num_digits):
    return [x for x in arr if len(str(x))==num_digits]
$$;

SELECT ID_COL, 
      ARRAY_COL_VALUE,
      filter_arr(ARRAY_COL_VALUE, 5),
      filter_arr(ARRAY_COL_VALUE, 6)
FROM t;

Output:

enter image description here

Upvotes: 1

Related Questions