Lukasz Szozda
Lukasz Szozda

Reputation: 176124

Snowflake - ARRAY_AGG with LIMIT

We can use ARRAY_AGG aggregate function to generate an array from data.

The goal is to find a way to limit the input to specific number of entries like ARRAY_AGG(...) WITHIN GROUP(... LIMIT 3) without restructuring the main query.

Sidenote: UDAF (User-Defined Aggregate Functions) are not available at the moment of writing.

For sample data:

CREATE OR REPLACE TABLE tab(grp TEXT, col TEXT) AS
SELECT * FROM VALUES 
  ('Grp1', 'A'),('Grp1', 'B'),('Grp1', 'C'),('Grp1', 'D'), ('Grp1', 'E'),
  ('Grp2', 'X'),('Grp2', 'Y'),('Grp2', 'Z'),('Grp2', 'V'),
  ('Grp3', 'M'),('Grp3', 'N'),('Grp3', 'M');

Output:

GRP ARR_LIMIT_3
Grp3    [   "M",   "M",   "N" ]
Grp2    [   "V",   "X",   "Y" ]
Grp1    [   "A",   "B",   "C" ]

Usage of ARRAY_SLICE is not an option if underlying ARRAY_AGG exceeds 16MB

SELECT grp, 
       ARRAY_SLICE(ARRAY_AGG(col), 1,3)) 
FROM big_table
JOIN ...
GROUP BY grp;
-- Result array of ARRAYAGG is too large

Upvotes: 2

Views: 1901

Answers (1)

Lukasz Szozda
Lukasz Szozda

Reputation: 176124

It is possible to achieve similar effect by using MIN_BY/MIN_MAX function:

MIN_BY( <col_to_return>, <col_containing_mininum> [ , <maximum_number_of_values_to_return> ] )

SELECT grp,
       ARRAY_AGG(col) AS arr,
       MIN_BY(col, col, 3) AS arr_limit_3
FROM tab
GROUP BY grp;

Output:

GRP ARR ARR_LIMIT_3
Grp3 [ "M", "N", "M" ] [ "M", "M", "N" ]
Grp2 [ "X", "Y", "Z", "V" ] [ "V", "X", "Y" ]
Grp1 [ "A", "B", "C", "D", "E" ] [ "A", "B", "C" ]

If the sorting is irrelevant then MIN_BY(col, 'some_constant', 3).


ARRAY_UNIQUE_AGG or ARRAY_AGG(DISTINCT ...) is:

SELECT grp,
       ARRAY_UNIQUE_AGG(col) AS arr,
       ARRAY_AGG(DISTINCT col) AS arr2,
       ARRAY_DISTINCT(MIN_BY(col, col, 3)) AS arr_distinct_limit_3
FROM tab
GROUP BY grp;

Output:

GRP ARR ARR2 ARR_DISTINCT_LIMIT_3
Grp3 [ "M", "N" ] [ "M", "N" ] [ "M", "N" ]
Grp2 [ "X", "Y", "Z", "V" ] [ "X", "Y", "Z", "V" ] [ "V", "X", "Y" ]
Grp1 [ "A", "B", "C", "D", "E" ] [ "A", "B", "C", "D", "E" ] [ "A", "B", "C" ]

It is possible to handle WITHIN GROUP(ORDER BY <some_col> ASC/DESC) too:

SELECT grp,
       -- ASC
       ARRAY_AGG(col) WITHIN GROUP(ORDER BY some_col),
       MIN_BY(col, some_col, 3),
       -- DESC
       ARRAY_AGG(col) WITHIN GROUP(ORDER BY some_col DESC),
       MAX_BY(col, some_col, 3)
FROM ...;

Upvotes: 4

Related Questions