Wells
Wells

Reputation: 10969

Distinct values from an array?

Following tables:

CREATE TEMPORARY TABLE guys ( guy_id integer primary key, guy text );
CREATE TEMPORARY TABLE sales ( log_date date, sales_guys integer[], sales smallint );
INSERT INTO guys VALUES(1,'john'),(2,'joe');
INSERT INTO sales VALUES('2016-01-01', '{1,2}', 2),('2016-01-02','{1,2}',4);

Following query works great to show names on a given date:

SELECT log_date, sales_guys, ARRAY_AGG(guy), sales 
FROM sales 
JOIN guys ON 
   guys.guy_id = ANY(sales.sales_guys) 
GROUP BY log_date, sales_guys, sales 
ORDER BY log_date ASC;

  log_date  | sales_guys | array_agg  | sales 
------------+------------+------------+-------
 2016-01-01 | {1,2}      | {john,joe} |     2
 2016-01-02 | {1,2}      | {john,joe} |     4

Following query problematically gives me a name per date per guy, so here each name twice, and so on):

SELECT sales_guys, ARRAY_AGG(guy), SUM(sales) AS sales
FROM sales
JOIN guys ON guys.guy_id = ANY(sales.sales_guys)
GROUP BY sales_guys;

Yields:

 sales_guys |      array_agg      | sales 
------------+---------------------+-------
 {1,2}      | {john,joe,john,joe} |    12

Is there a way to somehow reduce the ARRAY_AGG call to give only the unique names?

Upvotes: 0

Views: 13669

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656241

There is no kind of order you can trust without ORDER BY. Except that elements of arrays, when unnested, come in array order. If your query does more with the result, it may be re-ordered, though.

You an simply add ORDER BY to any aggregate function in Postgres:

SELECT s.sales_guys, ARRAY_AGG(DISTINCT g.guy ORDER BY g.guy) AS names, SUM(s.sales) AS sum_sales
FROM   sales s
JOIN   guys  g ON g.guy_id = ANY(s.sales_guys)
GROUP  BY s.sales_guys;

But that's obviously not the original order of array elements. And the query has other issues ... Neither IN nor = ANY() care about order of elements in the set, list or array on the right side:

Proper solution

For this task (attention to the details!):

Get the total sales per array sales_guys, where the order of elements makes a difference (arrays '{1,2}' and '{2,1}' are not the same) and sales_guys has neither duplicate nor NULL elements. Add an array of resolved names in matching order.

Use unnest() with WITH ORDINALITY. and aggregate arrays before you resolve names, that's cheaper and less error prone.

SELECT s.*, g.
FROM  (
   SELECT sales_guys, sum (sales) AS total_sales                -- aggregate first in subquery
   FROM   sales
   GROUP  BY 1
   ) s
, LATERAL (
   SELECT array_agg(guy ORDER BY ord) AS names                  -- order by original order
   FROM   unnest(s.sales_guys) WITH ORDINALITY sg(guy_id, ord)  -- with order of elements
   LEFT   JOIN guys g USING (guy_id)                            -- LEFT JOIN to add NULL for missing guy_id
   ) g;

The LATERAL subquery can be joined with unconditional CROSS JOIN - comma (,) is shorthand notation - because the aggregate in the subquery guarantees a result for every row. Else you'd use LEFT JOIN LATERAL .. ON true.

Detailed explanation:

Upvotes: 2

clemens
clemens

Reputation: 17711

You can use DISTINCT inside of the aggregate:

SELECT sales_guys, ARRAY_AGG(DISTINCT guy), SUM(sales) AS sales FROM sales JOIN guys ON guys.guy_id = ANY(sales.sales_guys) GROUP BY sales_guys;

Upvotes: 0

Related Questions