Reputation: 10969
Following tables:
CREATE TEMPORARY TABLE guys ( guy_id integer primary key, guy text );
CREATE TEMPORARY TABLE sales ( log_date date, sales_guys integer[], sales smallint );
INSERT INTO guys VALUES(1,'john'),(2,'joe');
INSERT INTO sales VALUES('2016-01-01', '{1,2}', 2),('2016-01-02','{1,2}',4);
Following query works great to show names on a given date:
SELECT log_date, sales_guys, ARRAY_AGG(guy), sales
FROM sales
JOIN guys ON
guys.guy_id = ANY(sales.sales_guys)
GROUP BY log_date, sales_guys, sales
ORDER BY log_date ASC;
log_date | sales_guys | array_agg | sales
------------+------------+------------+-------
2016-01-01 | {1,2} | {john,joe} | 2
2016-01-02 | {1,2} | {john,joe} | 4
Following query problematically gives me a name per date per guy, so here each name twice, and so on):
SELECT sales_guys, ARRAY_AGG(guy), SUM(sales) AS sales
FROM sales
JOIN guys ON guys.guy_id = ANY(sales.sales_guys)
GROUP BY sales_guys;
Yields:
sales_guys | array_agg | sales
------------+---------------------+-------
{1,2} | {john,joe,john,joe} | 12
Is there a way to somehow reduce the ARRAY_AGG
call to give only the unique names?
Upvotes: 0
Views: 13669
Reputation: 656241
There is no kind of order you can trust without ORDER BY
. Except that elements of arrays, when unnested, come in array order. If your query does more with the result, it may be re-ordered, though.
You an simply add ORDER BY
to any aggregate function in Postgres:
SELECT s.sales_guys, ARRAY_AGG(DISTINCT g.guy ORDER BY g.guy) AS names, SUM(s.sales) AS sum_sales
FROM sales s
JOIN guys g ON g.guy_id = ANY(s.sales_guys)
GROUP BY s.sales_guys;
But that's obviously not the original order of array elements. And the query has other issues ... Neither IN
nor = ANY()
care about order of elements in the set, list or array on the right side:
For this task (attention to the details!):
Get the total sales
per array sales_guys
, where the order of elements makes a difference (arrays '{1,2}'
and '{2,1}'
are not the same) and sales_guys
has neither duplicate nor NULL elements. Add an array of resolved names in matching order.
Use unnest()
with WITH ORDINALITY
. and aggregate arrays before you resolve names, that's cheaper and less error prone.
SELECT s.*, g.
FROM (
SELECT sales_guys, sum (sales) AS total_sales -- aggregate first in subquery
FROM sales
GROUP BY 1
) s
, LATERAL (
SELECT array_agg(guy ORDER BY ord) AS names -- order by original order
FROM unnest(s.sales_guys) WITH ORDINALITY sg(guy_id, ord) -- with order of elements
LEFT JOIN guys g USING (guy_id) -- LEFT JOIN to add NULL for missing guy_id
) g;
The LATERAL
subquery can be joined with unconditional CROSS JOIN
- comma (,
) is shorthand notation - because the aggregate in the subquery guarantees a result for every row. Else you'd use LEFT JOIN LATERAL .. ON true
.
Detailed explanation:
Upvotes: 2
Reputation: 17711
You can use DISTINCT
inside of the aggregate:
SELECT sales_guys, ARRAY_AGG(DISTINCT guy), SUM(sales) AS sales FROM sales JOIN guys ON guys.guy_id = ANY(sales.sales_guys) GROUP BY sales_guys;
Upvotes: 0