jantristanmilan
jantristanmilan

Reputation: 4358

Why does MySQL allow you to group by columns that are not selected

I'm reading a book on SQL (Sams Teach Yourself SQL in 10 Minutes) and its quite good despite its title. However the chapter on group by confuses me

"Grouping data is a simple process. The selected columns (the column list following the SELECT keyword in a query) are the columns that can be referenced in the GROUP BY clause. If a column is not found in the SELECT statement, it cannot be used in the GROUP BY clause. This is logical if you think about it—how can you group data on a report if the data is not displayed? "

How come when I ran this statement in MySQL it works?

select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;

Upvotes: 1

Views: 2361

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

Because the book is wrong.

The columns in the group by have only one relationship to the columns in the select according to the ANSI standard. If a column is in the select, with no aggregation function, then it (or the expression it is in) needs to be in the group by statement. MySQL actually relaxes this condition.

This is even useful. For instance, if you want to select rows with the highest id for each group from a table, one way to write the query is:

select t.*
from table t
where t.id in (select max(id)
               from table t
               group by thegroup
              );

(Note: There are other ways to write such a query, this is just an example.)

EDIT:

The query that you are suggesting:

select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;

would work in MySQL but probably not in any other database (unless BONUS happens to be a poorly named primary key on the table, but that is another matter). It will produce one row for each value of BONUS. For each row, it will get an arbitrary EMP_ID and SALARY from rows in that group. The documentation actually says "indeterminate", but I think arbitrary is easier to understand.

What you should really know about this type of query is simply not to use it. All the "bare" columns in the SELECT (that is, with no aggregation functions) should be in the GROUP BY. This is required in most databases. Note that this is the inverse of what the book says. There is no problem doing:

select EMP_ID
from EMPLOYEE_PAY_TBL
group by EMP_ID, BONUS;

Except that you might get multiple rows back for the same EMP_ID with no way to distinguish among them.

Upvotes: 4

Bill Karwin
Bill Karwin

Reputation: 562330

You're right, MySQL does allow you to create queries that are ambiguous and have arbitrary results. MySQL trusts you to know what you're doing, so it's your responsibility to avoid queries like that.

You can make MySQL enforce GROUP BY in a more standard way:

mysql> SET SQL_MODE=ONLY_FULL_GROUP_BY;

mysql> select EMP_ID, SALARY
  from EMPLOYEE_PAY_TBL
  group by BONUS;

ERROR 1055 (42000): 'test.EMPLOYEE_PAY_TBL.EMP_ID' isn't in GROUP BY

Upvotes: 5

Related Questions