Reputation: 10396
Surprise -- this is a perfectly valid query in MySQL:
select X, Y from someTable group by X
If you tried this query in Oracle or SQL Server, you’d get the natural error message:
Column 'Y' is invalid in the select list because it is not contained in
either an aggregate function or the GROUP BY clause.
So how does MySQL determine which Y to show for each X? It just picks one. From what I can tell, it just picks the first Y it finds. The rationale being, if Y is neither an aggregate function nor in the group by clause, then specifying “select Y” in your query makes no sense to begin with. Therefore, I as the database engine will return whatever I want, and you’ll like it.
There’s even a MySQL configuration parameter to turn off this “looseness”. http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by
This article even mentions how MySQL has been criticized for being ANSI-SQL non-compliant in this regard. http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html
My question is: Why was MySQL designed this way? What was their rationale for breaking with ANSI-SQL?
Upvotes: 73
Views: 33627
Reputation: 15849
Unfortunately almost all the SQL varieties have situations where they break ANSI and have unpredictable results.
It sounds to me like they intended it to be treated like the "FIRST(Y)" function that many other systems have.
More than likely, this construct is something that the MySQL team regret, but don't want to stop supporting because of the number of applications that would break.
Upvotes: 3
Reputation: 415
According to this page (the 5.0 online manual), it's for better performance and user convenience.
Upvotes: 26
Reputation: 1
From what I have read in the mysql reference page, it says: "You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group."
I suggest you to read this page (link to the reference manual of mysql): http://dev.mysql.com/doc/refman/5.5/en//group-by-extensions.html
Upvotes: 0
Reputation: 1
Its actually a very useful tool that all other fields dont have to be in an aggregate function when you group by a field. You can manipulate the result which will be returned by simply ordering it first and then grouping it after. for instance if i wanted to get user login information and i wanted to see the last time the user logged in i would do this.
Tables
USER
user_id | name
USER_LOGIN_HISTORY
user_id | date_logged_in
USER_LOGIN_HISTORY has multiple rows for one user so if i joined users to it it would return many rows. as i am only interested in the last entry i would do this
select
user_id,
name,
date_logged_in
from(
select
u.user_id,
u.name,
ulh.date_logged_in
from users as u
join user_login_history as ulh
on u.user_id = ulh.user_id
where u.user_id = 1234
order by ulh.date_logged_in desc
)as table1
group by user_id
This would return one row with the name of the user and the last time that user logged in.
Upvotes: -1
Reputation: 599
MySQL treats this is a single column DISTINCT when you use GROUP BY without an aggregate function. Using other options you either have the whole result be distinct, or have to use subqueries, etc. The question is whether the results are truly predictable.
Also, good info is in this thread.
Upvotes: 2
Reputation: 6622
I believe that it was to handle the case where grouping by one field would imply other fields are also being grouped:
SELECT user.id, user.name, COUNT(post.*) AS posts
FROM user
LEFT OUTER JOIN post ON post.owner_id=user.id
GROUP BY user.id
In this case the user.name will always be unique per user.id, so there is convenience in not requiring the user.name in the GROUP BY
clause (although, as you say, there is definite scope for problems)
Upvotes: 23