user2514376
user2514376

Reputation: 147

postgresql 9.3. Group by without all columns

I have a problem with the following query:

SELECT 
        ee.id 
        ee.column2
        ee.column3,
        ee.column4, 
        SUM(ee.column5)
      FROM 
         table1 ee 
         LEFT JOIN table2 epc ON ee.id = epc.id
      WHERE 
         ee.id (6050) 
      GROUP BY ee.id

WHERE column id is the primary key. On version 8.4, the query returns an error saying that column2, column3 and column4 don't exist in the group by clause.

This same query executes successfully on version 9.3.

Does anybody know why?

Upvotes: 8

Views: 6612

Answers (3)

Khorkrak
Khorkrak

Reputation: 4017

This does not hold for views however.

So this works fine:

CREATE TABLE table1(
  id int NOT null PRIMARY KEY,
  column2 int,
  column3 int,
  column4 int,
  column5 int
); 

SELECT id, column2, column3, column4,  SUM(column5)
  FROM table1
 GROUP BY id;

If you instead replace querying against the table with a view that's simply a select of all of the columns in the table:

CREATE VIEW view1 AS 
  SELECT id, column2, column3, column4, column5
    FROM table1;

SELECT id, column2, column3, column4,  SUM(column5)
  FROM view1
 GROUP BY id;

It'll fail with (PostgreSQL 11.18):

ERROR:  column "view1.column2" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT id, column2, column3, column4,  SUM(column5)
                   ^
SQL state: 42803
Character: 12

Upvotes: 0

Mark Rotteveel
Mark Rotteveel

Reputation: 109263

The SQL-99 standard introduced the concept of functionally dependent columns. A column is functionally dependent on another column when that other column (or set of columns) already uniquely defines it. So if you have a table with a primary key, then all other columns in that table are functionally dependent on that primary key.

So when using a GROUP BY, and you include the primary key of a table, then you do not need to include the other columns of that same table in the GROUP BY-clause as they have already been uniquely identified by the primary key.

This is also documented in GROUP BY Clause:

When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

(emphasis mine)

Upvotes: 2

user330315
user330315

Reputation:

This was introduced in 9.1

Quote from the release notes:

Allow non-GROUP BY columns in the query target list when the primary key is specified in the GROUP BY clause (Peter Eisentraut)
The SQL standard allows this behavior, and because of the primary key, the result is unambiguous.

It is also explained with examples in the chapter about group by:

In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since they are referenced in the query select list (but see below). The column s.units does not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(...)), which represents the sales of a product. For each product, the query returns a summary row about all sales of the product.

In a nutshell: if the group by clause contains a column that uniquely identifies the rows, it is sufficient to include that column only.

Upvotes: 6

Related Questions