Guerrilla
Guerrilla

Reputation: 14906

Why does this query need aggregate?

I am trying to join 2 tables into one table book (they have one-to-one relation) and then nest a one-to-many relation ebooks

If I try to nest the one-to-many results it works fine:

select r.*, array_agg(e) e_books from gardner_record r
left join gardner_e_book_record e on r.ean_number = e.physical_edition_ean
group by r.id

This works as expected. I get the results back but there is a few more columns I need to add to gardner_record from another table with one-to-one relationship called gardner_inventory_item

This is what I have tried:

select book.*, array_agg(e) e_books from (
    select  r.*,
           i.price,
           i.discount,
           i.free_stock,
           i.report,
           i.report_date
    from gardner_record r
             inner join gardner_inventory_item i on r.ean_number = i.ean
    ) book
left join gardner_e_book_record e on book.ean_number = e.physical_edition_ean
group by book.id

But it doesn't work. The query gives error that modified_date (a column on gardner_record) needs to be in group by or aggregated.

Why is this? I make a single table in subquery. Then I left join the nested records as ebooks and use the array_agg() in select. Shouldn't that mean it is all aggregated? It works in first query.

What am I missing conceptually here?

Upvotes: 0

Views: 115

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658102

The first query does not produce the same error because (I assume since you did not disclose) id is the PRIMARY KEY of gardner_record, and the PK covers all columns of the same table in the GROUP BY clause since Postgres 9.1. Related:

In the second query, you have formed the derived table book before you aggregate. In the outer SELECT, id no longer has that property (the derived table book has no PK) and it only covers itself in the GROUP BY list - and there are also additional columns from gardner_inventory_item which are not covered at all in the GROUP BY list.

This would work:

SELECT r.*
     , i.price
     , i.discount
     , i.free_stock
     , i.report
     , i.report_date
     , array_agg(e) AS e_books
FROM   gardner_record r
JOIN   gardner_inventory_item i ON r.ean_number = i.ean
LEFT   JOIN  gardner_e_book_record e
GROUP  BY r.id, i.ean;  -- Assuming these are PKs of each table

Or, assuming we can aggregate on physical_edition_ean just as well (since you mentioned that 1:1 relationship):

SELECT r.*
     , i.price
     , i.discount
     , i.free_stock
     , i.report
     , i.report_date
     , array_agg(e) AS e_books
FROM   gardner_record r
JOIN   gardner_inventory_item i ON r.ean_number = i.ean
LEFT   JOIN (
   SELECT physical_edition_ean, array_agg(e) AS e_books
   FROM   gardner_e_book_record e
   GROUP  BY 1
   ) e ON e.physical_edition_ean = r.ean_number

Simpler and faster.

Upvotes: 1

Henry
Henry

Reputation: 15691

In general any column that does not appear in an aggregate function must be included in the GROUP BY clause, however if the selected table has a primary key, then simply including the primary key in the GROUP BY clause allows all columns of that table to be selected without aggregation, which you make use of in the first query. However this approach does not work when the primary key is wrapped in a subquery since the Postgres optimizer cannot tell that the column was originally a primary key anymore.

To get around this, you could either list out all the columns of the book subquery explicitly (which might be cumbersome).

Alternatively, you could join and group the gardner_record and gardner_e_book_record tables in a subquery and then join the gardner_inventory_item afterwards. Basically just putting your first query as the subquery and then joining on the gardner_inventory_item table. Example:

select book_e_ebooks, i.* from (
        select r.*, array_agg(e) e_books from gardner_record r
        left join gardner_e_book_record e on r.ean_number = e.physical_edition_ean
        group by r.id
    ) book_e_ebooks
left join gardner_inventory_item i on book_e_ebooks.ean_number = i.ean;

But you really don't need a subquery at all:

select r.*, i.*, array_agg(e) e_books from gardner_record r
left join gardner_inventory_item i on r.ean_number = i.ean
left join gardner_e_book_record e on r.ean_number = e.physical_edition_ean
group by r.id, i.id

Upvotes: 1

Related Questions