Reputation: 668
lets assume im having to following query
select column1, count(*) from (values ('a'), ('b'), ('ab'), ('ba')) group by 1;
COLUMN1 COUNT(*)
a 1
b 1
ba 1
ab 1
and I want that my grouping will be order insensitive means that I want to count ab
and ba
as the same value.
so the expected result will be
COLUMN1 COUNT(*)
a 1
b 1
ab 2
I thought about sorting the select so it will handle them both as the same value but I didnt find any option to sort the value in snowflake.
Upvotes: 3
Views: 801
Reputation: 175596
SQL is expressive enough to achieve this effect without using JS/Java/Python functions.
Preparation - rearranging input string:
SELECT column1, seq,
LISTAGG(sub.value) WITHIN GROUP(ORDER BY sub.value) AS sorted_column1
FROM (VALUES ('a'), ('b'), ('ab'), ('ba')) AS src
,TABLE(SPLIT_TO_TABLE(TRIM(REGEXP_REPLACE(column1, '(.)', '\\1~'), '~'), '~')) AS sub
GROUP BY column1, seq;
Output:
COLUMN1 | SORTED_COLUMN1 |
---|---|
ba | ab |
b | b |
ab | ab |
a | a |
WITH cte AS (
SELECT src.column1, sub.seq,
LISTAGG(sub.value) WITHIN GROUP(ORDER BY sub.value) AS sorted_column1
FROM (VALUES ('a'), ('b'), ('ab'), ('ba')) AS src
,TABLE(SPLIT_TO_TABLE(TRIM(REGEXP_REPLACE(column1,'(.)', '\\1~'),'~'),'~')) AS sub
GROUP BY src.column1, sub.seq
)
SELECT sorted_column1, COUNT(*)
FROM cte
GROUP BY sorted_column1;
Output:
SORTED_COLUMN1 | COUNT(*) |
---|---|
ab | 2 |
b | 1 |
a | 1 |
Upvotes: 0
Reputation: 11046
There may be a way to do this in a SQL, but a JavaScript UDF makes it easy:
create or replace function SORT_STRING(TEXT string)
returns string
language javascript
strict immutable
as
$$
return TEXT.split('').sort().join('');
$$;
select SORT_STRING(column1) SORTED_C1, count(*)
from (values ('a'), ('b'), ('ab'), ('ba')) group by 1;
SORTED_C1 | COUNT(*) |
---|---|
a | 1 |
b | 1 |
ab | 2 |
Upvotes: 10