Aggregate a single column in query with many columns

Question

Is there a proper way to aggregate a single column when I have many other columns in the query?

I've tried this answer which works, but my query has become a lot more verbose.

My current query looks like this:

SELECT t1.foo1, t1.foo2, t2.foo3, t2.foo4, string_agg(t3.aggregated_field, ', ')
FROM tbl1 t1
LEFT JOIN tbl2 t2 ON t1.id = t2.fkeyid
LEFT JOIN tbl3 t3 ON t2.id = t3.fkeyid
GROUP BY t1.foo1, t1.foo2, t2.foo3, t2.foo4, t2.foo5, t2.foo6
ORDER BY t2.foo5, t2.foo6

The query has many more fields and LEFT JOINs, the important part is that all these fields have 1 to 1 or 1 to 0 relationship except one field that is 1 to n which I want to aggregate, represented by t3.aggregated_field in the pseudo-query above.

As I'm using an aggregate function, all fields listed in the SELECT and ORDER BY must be either aggregated or part of the GROUP BY clause. This makes my query way more verbose than it already is.

That is, assuming foo1 is a primary key, when this field is repeated, all others except aggregated_field are also equal. I want these repeated rows as a single row result with the aggregated field value. (basically a select distinct with an aggregated column)

Is there a better way to do this (without having to put all other fields in the GROUP BY) or should I just iterate over the result set in my back-end executing a query for each row fetching this 1 to n relationship?

The server is running PostgreSQL 9.1.9, more specifically:

PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54), 64-bit

Erwin Brandstetter · Accepted Answer

Simple query

This can be much simpler with PostgreSQL 9.1 or later. As explained in this closely related answer:

PGError: ERROR: aggregates not allowed in WHERE clause on a AR query of an object and its has_many objects

It is enough to GROUP BY the primary key of a table. Since:

foo1 is a primary key

.. you can simplify your example to:

SELECT foo1, foo2, foo3, foo4, foo5, foo6, string_agg(aggregated_field, ', ')
FROM   tbl1
GROUP  BY 1
ORDER  BY foo7, foo8;  -- have to be spelled out, since not in select list!

Query with multiple tables

However, since you have:

many more fields and LEFT JOINs, the important part is that all these fields have 1 to 1 or 1 to 0 relationship except one field that is 1 to n which I want to aggregate

.. it should be faster and simpler to aggregate first, join later:

SELECT t1.foo1, t1.foo2, ...
     , t2.bar1, t2.bar2, ...
     , a.aggregated_col 
FROM   tbl1 t1
LEFT   JOIN tbl2 t2 ON ...
...
LEFT   JOIN (
   SELECT some_id, string_agg(agg_col, ', ') AS aggregated_col
   FROM   agg_tbl a ON ...
   GROUP  BY some_id
   ) a ON a.some_id = ?.some_id
ORDER  BY ...

This way the big portion of your query does not need aggregation at all.

I recently provided a test case in an SQL Fiddle to prove the point in this related answer:

PostgreSQL - order by an array

Since you are referring to this related answer: No, DISTINCT is not going to help at all in this case.

Aggregate a single column in query with many columns

Answers (2)

Simple query

Query with multiple tables

Related Questions