Omer Shacham
Omer Shacham

Reputation: 668

List sorting in Snowflake as part of the select

lets assume im having to following query

select column1, count(*) from (values ('a'), ('b'), ('ab'), ('ba')) group by 1;

COLUMN1 COUNT(*)
a   1
b   1
ba  1
ab  1

and I want that my grouping will be order insensitive means that I want to count ab and ba as the same value. so the expected result will be

COLUMN1 COUNT(*)
a   1
b   1
ab  2

I thought about sorting the select so it will handle them both as the same value but I didnt find any option to sort the value in snowflake.

Upvotes: 3

Views: 801

Answers (2)

Lukasz Szozda
Lukasz Szozda

Reputation: 175596

SQL is expressive enough to achieve this effect without using JS/Java/Python functions.

Preparation - rearranging input string:

SELECT column1, seq,
       LISTAGG(sub.value) WITHIN GROUP(ORDER BY sub.value) AS sorted_column1
FROM (VALUES ('a'), ('b'), ('ab'), ('ba')) AS src
,TABLE(SPLIT_TO_TABLE(TRIM(REGEXP_REPLACE(column1, '(.)', '\\1~'), '~'), '~')) AS sub
GROUP BY column1, seq;

Output:

COLUMN1 SORTED_COLUMN1
ba ab
b b
ab ab
a a

WITH cte AS (
  SELECT src.column1, sub.seq, 
         LISTAGG(sub.value) WITHIN GROUP(ORDER BY sub.value) AS sorted_column1
  FROM (VALUES ('a'), ('b'), ('ab'), ('ba')) AS src
  ,TABLE(SPLIT_TO_TABLE(TRIM(REGEXP_REPLACE(column1,'(.)', '\\1~'),'~'),'~')) AS sub
  GROUP BY src.column1, sub.seq
)
SELECT sorted_column1, COUNT(*)
FROM cte
GROUP BY sorted_column1;

Output:

SORTED_COLUMN1 COUNT(*)
ab 2
b 1
a 1

Upvotes: 0

Greg Pavlik
Greg Pavlik

Reputation: 11046

There may be a way to do this in a SQL, but a JavaScript UDF makes it easy:

create or replace function SORT_STRING(TEXT string)
returns string
language javascript
strict immutable
as
$$
    return TEXT.split('').sort().join('');
$$;

select SORT_STRING(column1) SORTED_C1, count(*) 
from (values ('a'), ('b'), ('ab'), ('ba')) group by 1;
SORTED_C1 COUNT(*)
a 1
b 1
ab 2

Upvotes: 10

Related Questions