inhan
inhan

Reputation: 7470

Grouped JOIN query in MySQL - using the ON vs. HAVING condition for filtering (any difference?)

Following are pseudo queries so I don't care about the results, but is there any sort of difference (in performance, result amount etc.) between the following MySQL queries?

Query 1

SELECT u.`username`, COUNT(*) AS 'posts', u.`age`
FROM `users` u
INNER JOIN `posts` p
ON p.`user_id`=u.`id`
GROUP BY u.`id`
HAVING u.`age` > 12

Suppose that the column smth in this query is selected only for the sake of HAVING conditon, so its value is not needed in the results.

Query 2

SELECT u.`username`, COUNT(*) AS 'posts'
FROM `users` u
INNER JOIN `posts` p
ON p.`user_id`=u.`id` AND u.`age` > 12
GROUP BY u.`id`

Upvotes: 1

Views: 65

Answers (2)

Brent Baisley
Brent Baisley

Reputation: 12721

HAVING is always done after all of the JOINs and WHERE filters are done. It is essentially a filter on the result of the query. This is why you can't use it to filter on a field that isn't part of the query.

Your queries would execute differently. Query 2 would be the more efficient since it would filter out users during the JOIN. Query 1 would join with all the users, pulling all that data and GROUPing it, then filtering it on age. Query 2 would GROUP and filter only users under the age of 12, then GROUP it. Less data would require to be read and group in Query 2.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271151

Your queries are totally different. In particular, the second is valid standard SQL syntax. The first uses a MySQL feature that allows any column are allowed in aggregates.

In particular, the column smth in the first query is coming from an arbitrary row in the matching data. If all values are the same on all the rows, the results with the two would be equivalent.

If I assume that u.id is unique for each row in users, then the result sets are equivalent. However, I think a more readable version of the SQL would be:

SELECT u.`username`, COUNT(*) AS 'posts', u.`smth`
FROM `users` u  INNER JOIN
     `posts` p 
     ON p.`user_id`=u.`id`
WHERE u.smth is not null
GROUP BY u.`id`, u.username
HAVING u.`smth` IS NOT NULL 

This makes it clear that you want a separate username on each row and that you don't want smth to be NULL. In terms of performance, all this version is equivalent to your second version, and all three should be pretty much the same.

Upvotes: 1

Related Questions