Kris
Kris

Reputation: 23559

MySQL: What happens to non-aggregated fields upon a GROUP BY?

I have very a basic question about the following behavior in MySQL.

Suppose we do the following GROUP BY:

SELECT a, b, SUM(c)
FROM table
GROUP BY b;

What happens to the field a, which is neither aggregated nor is it included in the GROUP BY fields?

Does MySQL just implicitly apply FIRST(a) to a? If so, is this behavior consistent or does it grab a random value out of all values for a?

Upvotes: 3

Views: 1708

Answers (2)

Alex Tartan
Alex Tartan

Reputation: 6826

MySQL > 5.7.5

It's an illegal query.

You will receive an error like:

ERROR 1055 (42000): Expression #1 of SELECT list is not in 
GROUP BY clause and contains nonaggregated column 'a' 
which is not functionally dependent on columns in 
GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

MySQL 5.7.5 and up implements detection of functional dependence. If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them

More details at https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html


MySQL < 5.7.5

Short anwser: it's a valid query, but the server is free to return any value

Reading this https://dev.mysql.com/doc/refman/5.0/en/group-by-handling.html shows:

12.16.3 MySQL Handling of GROUP BY In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:

SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;

For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.

So there is no way of determining what that value will be (if there are more values for that group)

Upvotes: 5

Steve Gray
Steve Gray

Reputation: 460

It's the first result value the query processor gets back from the storage medium, dependant on the chosen query strategy. Technically this is undefined, but your table has no indicies other than it's key, this is reasonably deterministic, but you're at the mercy of the optimiser.

Rely on this at your peril. This is why window functions were invented.

Upvotes: 2

Related Questions