Reputation: 431

Snowflake - Count distinct values in comma seperated list

I basically have a column that looks like below.

"[
  ""what"",
  ""how"",
]"
"[
  ""how"",
  ""what"",

]"
"[
  ""project_management"",
  ""do it"",
  ""personal""
]"
"[
  ""do it"",
  ""finance"",
  ""events"",
  ""save""
]"
"[
  ""do it"",
  ""sales"",
  ""events""
]"
"[
  ""finance"",
  ""sales"",
  ""events""
]"
"[
  ""events""
]"

I am simple trying to get a count of each unique instance/value within the column and output value counts for each value that is seperated by a string. The output should look like the following:

What: 2 
how: 2
do it: 3 
Finance: 4 
etc.

I tried the following but the problem is it only counts lists that repeat itself and not the individual values within the list itself

select (i.OUTCOMES), count(i.OUTCOMES)
from table i
GROUP BY 1;

Upvotes: 0

Answers (3)

akshindesnowflake

Reputation: 601

Using SPLIT_TO_TABLE & REPLACE FUNCTIONS

SELECT COL_VAL,COUNT(COL_VAL) FROM

(

SELECT REPLACE(REPLACE(REPLACE(VALUE,'['),'"'),']') COL_VAL FROM TABLE( SPLIT_TO_TABLE('"[ ""what"", ""how"", ]" "[ ""how"", ""what"",

]" "[ ""project_management"", ""do it"", ""personal"" ]" "[ ""do it"", ""finance"", ""events"", ""save"" ]" "[ ""do it"", ""sales"", ""events"" ]" "[ ""finance"", ""sales"", ""events"" ]" "[ ""events"" ]"',','))) GROUP BY COL_VAL;

Upvotes: 1

Gokhan Atil

Reputation: 10199

It seems like an array, so you need to use flatten two times:

with data as (
select ARRAY_CONSTRUCT( ARRAY_CONSTRUCT('what','how'), 
ARRAY_CONSTRUCT('how','what'), 
ARRAY_CONSTRUCT('project_management','do it', 'personal') ) OUTCOMES
)
select item.VALUE::string, count(*) from data,
lateral flatten( OUTCOMES ) v,
lateral flatten( v.VALUE ) item
group by item.VALUE; 

+--------------------+----------+
| ITEM.VALUE::STRING | COUNT(*) |
+--------------------+----------+
| what               |        2 |
| how                |        2 |
| project_management |        1 |
| do it              |        1 |
| personal           |        1 |
+--------------------+----------+

Upvotes: 1

Felipe Hoffa

Reputation: 59375

You'll need to flatten the values.

If the variant is an array as described:

with data as (
select parse_json('["a", "b"]') v
union select parse_json('["a", "a", "c"]')
)

select x.value::string val, count(*) c
from data, table(flatten(v)) x
group by 1
;

Upvotes: 2

Snowflake - Count distinct values in comma seperated list

Answers (3)

Related Questions