Nelson Auner
Nelson Auner

Reputation: 1509

Flatten and aggregate two columns of arrays via distinct in Snowflake

Table structure is

+------------+---------+
|  Animals   |  Herbs  |
+------------+---------+
| [Cat, Dog] | [Basil] |
| [Dog, Lion]| []      |
+------------+---------+

Desired output (don't care about sorting of this list):

unique_things
+------------+
[Cat, Dog, Lion, Basil]

First attempt was something like

SELECT ARRAY_CAT(ARRAY_AGG(DISTINCT(animals)), ARRAY_AGG(herbs))

But this produces

[[Cat, Dog], [Dog, Lion], [Basil], []]

Since the distinct is operating on each array, not looking at distinct components within all arrays

Upvotes: 6

Views: 4222

Answers (3)

Lukasz Szozda
Lukasz Szozda

Reputation: 176144

UPDATE

It is possible without using FLATTEN, by using ARRAY_UNION_AGG:

Returns an ARRAY that contains the union of the distinct values from the input ARRAYs in a column.

For sample data:

CREATE OR REPLACE TABLE t AS
SELECT ['Cat', 'Dog'] AS Animals, ['Basil'] AS Herbs
UNION SELECT ['Dog', 'Lion'], [];

Query:

SELECT ARRAY_UNION_AGG(ARRAY_CAT(Animals, Herbs)) AS Result
FROM t

or:

SELECT ARRAY_UNION_AGG(Animals) AS Result
FROM (SELECT Animals FROM t 
      UNION ALL
      SELECT Herbs FROM t);

Output:

enter image description here


You could flatten the combined array and then aggregate back:

SELECT ARRAY_AGG(DISTINCT F."VALUE") AS unique_things
FROM tab, TABLE(FLATTEN(ARRAY_CAT(tab.Animals, tab.Herbs))) f

Upvotes: 4

Soni
Soni

Reputation: 76

Here is another variation to handle NULLs in case they appear in data set.

SELECT ARRAY_AGG(DISTINCT a.VALUE) unique_things from tab, TABLE (FLATTEN(array_compact(array_append(tab.Animals, tab.Herbs)))) a

Upvotes: 1

MMV
MMV

Reputation: 990

If I understand your requirements right and assuming the source tables of

insert into tabarray select array_construct('cat', 'dog'), array_construct('basil');
insert into tabarray select array_construct('lion', 'dog'), null;

I would say the result would look like this:

select array_agg(distinct value) from
(
  select
    value from tabarray
  , lateral flatten( input => col1 )
  union all
  select
  value from tabarray
  , lateral flatten( input => col2 ))
  ;

Upvotes: 3

Related Questions