Reputation: 70139
Is there a proper way to aggregate a single column when I have many other columns in the query?
I've tried this answer which works, but my query has become a lot more verbose.
My current query looks like this:
SELECT t1.foo1, t1.foo2, t2.foo3, t2.foo4, string_agg(t3.aggregated_field, ', ')
FROM tbl1 t1
LEFT JOIN tbl2 t2 ON t1.id = t2.fkeyid
LEFT JOIN tbl3 t3 ON t2.id = t3.fkeyid
GROUP BY t1.foo1, t1.foo2, t2.foo3, t2.foo4, t2.foo5, t2.foo6
ORDER BY t2.foo5, t2.foo6
The query has many more fields and LEFT JOIN
s, the important part is that all these fields have 1 to 1 or 1 to 0 relationship except one field that is 1 to n which I want to aggregate, represented by t3.aggregated_field
in the pseudo-query above.
As I'm using an aggregate function, all fields listed in the SELECT
and ORDER BY
must be either aggregated or part of the GROUP BY
clause. This makes my query way more verbose than it already is.
That is, assuming foo1
is a primary key, when this field is repeated, all others except aggregated_field
are also equal. I want these repeated rows as a single row result with the aggregated field value. (basically a select distinct
with an aggregated column)
Is there a better way to do this (without having to put all other fields in the GROUP BY
) or should I just iterate over the result set in my back-end executing a query for each row fetching this 1 to n relationship?
The server is running PostgreSQL 9.1.9, more specifically:
PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54), 64-bit
Upvotes: 5
Views: 8945
Reputation: 656291
This can be much simpler with PostgreSQL 9.1 or later. As explained in this closely related answer:
It is enough to GROUP BY
the primary key of a table. Since:
foo1 is a primary key
.. you can simplify your example to:
SELECT foo1, foo2, foo3, foo4, foo5, foo6, string_agg(aggregated_field, ', ')
FROM tbl1
GROUP BY 1
ORDER BY foo7, foo8; -- have to be spelled out, since not in select list!
However, since you have:
many more fields and LEFT JOINs, the important part is that all these fields have 1 to 1 or 1 to 0 relationship except one field that is 1 to n which I want to aggregate
.. it should be faster and simpler to aggregate first, join later:
SELECT t1.foo1, t1.foo2, ...
, t2.bar1, t2.bar2, ...
, a.aggregated_col
FROM tbl1 t1
LEFT JOIN tbl2 t2 ON ...
...
LEFT JOIN (
SELECT some_id, string_agg(agg_col, ', ') AS aggregated_col
FROM agg_tbl a ON ...
GROUP BY some_id
) a ON a.some_id = ?.some_id
ORDER BY ...
This way the big portion of your query does not need aggregation at all.
I recently provided a test case in an SQL Fiddle to prove the point in this related answer:
Since you are referring to this related answer: No, DISTINCT
is not going to help at all in this case.
Upvotes: 6
Reputation: 125204
If the main problem is that the fields (foox) are computed then this can help:
SELECT foo1, foo2, foo3, foo4, foo5, foo6, string_agg(aggregated_field, ', ')
FROM tbl1
GROUP BY 1, 2, 3, 4, 5, 6
ORDER BY 5, 6
The 1, 2...
are the fields in the order they appear in the select list.
Upvotes: 1