XForCE07
XForCE07

Reputation: 1234

GROUP BY without aggregate function

I am trying to understand GROUP BY (new to oracle dbms) without aggregate function.
How does it operate?
Here is what i have tried.

EMP table on which i will run my SQL.
EMP TABLE

SELECT ename , sal
FROM emp
GROUP BY ename , sal

Result

SELECT ename , sal  
FROM emp  
GROUP BY ename;  

Result

ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
*Cause:
*Action:
Error at Line: 397 Column: 16

SELECT ename , sal  
FROM emp  
GROUP BY sal;  

Result

ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
*Cause:
*Action: Error at Line: 411 Column: 8

SELECT empno , ename , sal  
FROM emp  
GROUP BY sal , ename;  

Result

ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
*Cause:
*Action: Error at Line: 425 Column: 8

SELECT empno , ename , sal  
FROM emp  
GROUP BY empno , ename , sal;  

Result

So, basically the number of columns have to be equal to the number of columns in the GROUP BY clause, but i still do not understand why or what is going on.

Upvotes: 98

Views: 284432

Answers (11)

Naman Chandak
Naman Chandak

Reputation: 19

I have a complex query

SELECT rcost.rId, resource.rId, resource.usedAt, empt.empId, 
joint.joiningDate, salaryt.salId, deptt.dName
FROM rcost
RIGHT JOIN resource ON rcost.rId = resource.rId
RIGHT JOIN empt ON resource.empId = empt.empId
LEFT JOIN joint ON empt.empId = joint.empId
LEFT JOIN salaryt ON empt.empId = salaryt.empId
RIGHT JOIN deptt ON salaryt.salId = deptt.salId
WHERE empt.empId <> 1
GROUP BY rcost.rId, resource.rId, resource.usedAt, empt.empId, joint.joiningDate, salaryt.salId, deptt.dName;

This works because: All columns in the SELECT clause are either part of the GROUP BY clause or are aggregated (though in this case, there are no aggregate functions, so all selected columns must be in the GROUP BY clause).

But in the following query,

SELECT rcost.rId, resource.rId, resource.usedAt, empt.empId, 
joint.joiningDate, salaryt.salId, deptt.dName
FROM rcost
RIGHT JOIN resource ON rcost.rId = resource.rId
RIGHT JOIN empt ON resource.empId = empt.empId
LEFT JOIN joint ON empt.empId = joint.empId
LEFT JOIN salaryt ON empt.empId = salaryt.empId
RIGHT JOIN deptt ON salaryt.salId = deptt.salId
WHERE empt.empId <> 1
GROUP BY rcost.rId, resource.usedAt, empt.empId, joint.joiningDate, 
salaryt.salId, deptt.dName;

This doesn't work because: The resource.rId column is in the SELECT clause but not in the GROUP BY clause. SQL requires that if a column is in the SELECT clause and is not used within an aggregate function, it must be included in the GROUP BY clause. Since resource.rId is selected but not grouped, the database engine doesn’t know how to handle it when it tries to group the rows by the other columns.

But there is an exception, the following query will work,

SELECT rcost.rId, resource.rId, resource.usedAt, empt.empId, 
joint.joiningDate, salaryt.salId, deptt.dName
FROM rcost
RIGHT JOIN resource ON rcost.rId = resource.rId
RIGHT JOIN empt ON resource.empId = empt.empId
LEFT JOIN joint ON empt.empId = joint.empId
LEFT JOIN salaryt ON empt.empId = salaryt.empId
RIGHT JOIN deptt ON salaryt.salId = deptt.salId
WHERE empt.empId <> 1
GROUP BY rcost.rId, resource.rId, empt.empId, joint.joiningDate, 
salaryt.salId, deptt.dName;

Many relational database management systems (RDBMS) like MySQL or PostgreSQL allow this kind of query to run, even though it technically violates the strict SQL standard. They might allow non-aggregated columns that are not in the GROUP BY clause to appear in the SELECT clause if the column is functionally dependent on the grouped columns.

Upvotes: 0

Jack Tsin
Jack Tsin

Reputation: 61

For anyone trying to group data (from foreign tables as an example) like a json object with nested arrays of data you can achieve this in sql with array_agg (you can also use this in conjunction with json_build_object to create a json object with key-value pairs).

As a refference, I found helpful this video on yt: https://www.youtube.com/watch?v=A6N1h9mcJf4

-- Edit

If you want to have a nested array inside a nested array, you could do it by using array.

In the following example, 'variation_images' (subquery 2 - in relation to the variation table) are nested under the 'variation' query (subquery 1 - in relation to product table) which is nested under the product query (main query):

SELECT product.title, product.slug, product.description,

ARRAY(SELECT jsonb_build_object(
'var_id', variation.id, 'var_name', variation.name, 'images', 

ARRAY(SELECT json_build_object('img_url', variation_images.images) 
FROM variation_images WHERE variation_images.variation_id = variation.id)
)
FROM variation WHERE variation.product_id = product.id)

FROM product 

Upvotes: 1

oerkelens
oerkelens

Reputation: 5161

Given this data:

Col1  Col2  Col3
 A     X     1
 A     Y     2
 A     Y     3
 B     X     0
 B     Y     3
 B     Z     1

This query:

SELECT Col1, Col2, Col3 FROM data GROUP BY Col1, Col2, Col3

Would result in exactly the same table.

However, this query:

SELECT Col1, Col2 FROM data GROUP BY Col1, Col2

Would result in:

Col1  Col2
 A     X  
 A     Y  
 B     X  
 B     Y  
 B     Z  

Now, a query:

SELECT Col1, Col2, Col3 FROM data GROUP BY Col1, Col2

Would create a problem: the line with A, Y is the result of grouping the two lines

 A     Y     2
 A     Y     3

So, which value should be in Col3, '2' or '3'?

Normally you would use a GROUP BY to calculate e.g. a sum:

SELECT Col1, Col2, SUM(Col3) FROM data GROUP BY Col1, Col2

So in the line, we had a problem with we now get (2+3) = 5.

Grouping by all your columns in your select is effectively the same as using DISTINCT, and it is preferable to use the DISTINCT keyword word readability in this case.

So instead of

SELECT Col1, Col2, Col3 FROM data GROUP BY Col1, Col2, Col3

use

SELECT DISTINCT Col1, Col2, Col3 FROM data

Upvotes: 65

sancz612
sancz612

Reputation: 33

As an addition

basically the number of columns have to be equal to the number of columns in the GROUP BY clause

is not a correct statement.

  • Any attribute which is not a part of GROUP BY clause can not be used for selection
  • Any attribute which is a part of GROUP BY clause can be used for selection but not mandatory.

Upvotes: 2

Tobberoth
Tobberoth

Reputation: 9527

That's how GROUP BY works. It takes several rows and turns them into one row. Because of this, it has to know what to do with all the combined rows where there have different values for some columns (fields). This is why you have two options for every field you want to SELECT : Either include it in the GROUP BY clause, or use it in an aggregate function so the system knows how you want to combine the field.

For example, let's say you have this table:

Name | OrderNumber
------------------
John | 1
John | 2

If you say GROUP BY Name, how will it know which OrderNumber to show in the result? So you either include OrderNumber in group by, which will result in these two rows. Or, you use an aggregate function to show how to handle the OrderNumbers. For example, MAX(OrderNumber), which means the result is John | 2 or SUM(OrderNumber) which means the result is John | 3.

Upvotes: 153

user2839702
user2839702

Reputation: 15

I know you said you want to understand group by if you have data like this:

COL-A  COL-B  COL-C  COL-D
  1      Ac      C1     D1
  2      Bd      C2     D2
  3      Ba      C1     D3
  4      Ab      C1     D4
  5      C       C2     D5

And you want to make the data appear like:

COL-A  COL-B  COL-C  COL-D
  4      Ab      C1     D4
  1      Ac      C1     D1
  3      Ba      C1     D3
  2      Bd      C2     D2
  5      C       C2     D5

You use:

select * from table_name
order by col-c,colb

Because I think this is what you intend to do.

Upvotes: 0

Munawar Shah Afridi
Munawar Shah Afridi

Reputation: 41

Use sub query e.g:

SELECT field1,field2,(SELECT distinct field3 FROM tbl2 WHERE criteria) AS field3
FROM tbl1 GROUP BY field1,field2

OR

SELECT DISTINCT field1,field2,(SELECT distinct field3 FROM tbl2 WHERE criteria) AS field3
FROM tbl1

Upvotes: 4

David Aldridge
David Aldridge

Reputation: 52336

The only real use case for GROUP BY without aggregation is when you GROUP BY more columns than are selected, in which case the selected columns might be repeated. Otherwise you might as well use a DISTINCT.

It's worth noting that other RDBMS's do not require that all non-aggregated columns be included in the GROUP BY. For example in PostgreSQL if the primary key columns of a table are included in the GROUP BY then other columns of that table need not be as they are guaranteed to be distinct for every distinct primary key column. I've wished in the past that Oracle did the same as it would have made for more compact SQL in many cases.

Upvotes: 14

Srini V
Srini V

Reputation: 11355

Let me give some examples.

Consider this data.

CREATE TABLE DATASET ( VAL1 CHAR ( 1 CHAR ),
                   VAL2 VARCHAR2 ( 10 CHAR ),
                   VAL3 NUMBER );

INSERT INTO
      DATASET ( VAL1, VAL2, VAL3 )
VALUES
      ( 'b', 'b-details', 2 );

INSERT INTO
      DATASET ( VAL1, VAL2, VAL3 )
VALUES
      ( 'a', 'a-details', 1 );

INSERT INTO
      DATASET ( VAL1, VAL2, VAL3 )
VALUES
      ( 'c', 'c-details', 3 );

INSERT INTO
      DATASET ( VAL1, VAL2, VAL3 )
VALUES
      ( 'a', 'dup', 4 );

INSERT INTO
      DATASET ( VAL1, VAL2, VAL3 )
VALUES
      ( 'c', 'c-details', 5 );

COMMIT;

Whats there in table now

SELECT * FROM DATASET;

VAL1 VAL2             VAL3
---- ---------- ----------
b    b-details           2
a    a-details           1
c    c-details           3
a    dup                 4
c    c-details           5

5 rows selected.

--aggregate with group by

SELECT
      VAL1,
      COUNT ( * )
FROM
      DATASET A
GROUP BY
      VAL1;

VAL1   COUNT(*)
---- ----------
b             1
a             2
c             2

3 rows selected.

--aggregate with group by multiple columns but select partial column

SELECT
      VAL1,
      COUNT ( * )
FROM
      DATASET A
GROUP BY
      VAL1,
      VAL2;

VAL1  
---- 
b             
c             
a             
a             

4 rows selected.

--No aggregate with group by multiple columns

SELECT
      VAL1,
      VAL2
FROM
      DATASET A
GROUP BY
      VAL1,
      VAL2;

    VAL1  
    ---- 
    b    b-details
    c    c-details
    a    dup
    a    a-details

    4 rows selected.

--No aggregate with group by multiple columns

SELECT
      VAL1
FROM
      DATASET A
GROUP BY
      VAL1,
      VAL2;

    VAL1  
    ---- 
    b
    c
    a
    a

    4 rows selected.

You have N columns in select (excluding aggregations), then you should have N or N+x columns

Upvotes: 5

Mister_Tom
Mister_Tom

Reputation: 1564

You're experiencing a strict requirement of the GROUP BY clause. Every column not in the group-by clause must have a function applied to reduce all records for the matching "group" to a single record (sum, max, min, etc).

If you list all queried (selected) columns in the GROUP BY clause, you are essentially requesting that duplicate records be excluded from the result set. That gives the same effect as SELECT DISTINCT which also eliminates duplicate rows from the result set.

Upvotes: 27

ogres
ogres

Reputation: 3690

If you have some column in SELECT clause , how will it select it if there is several rows ? so yes , every column in SELECT clause should be in GROUP BY clause also , you can use aggregate functions in SELECT ...

you can have column in GROUP BY clause which is not in SELECT clause , but not otherwise

Upvotes: 3

Related Questions