Reputation: 183
I'm trying to understand how mysql queries work with and without GROUP BY.
Imagine I have a table of FILM_ACTORS, where each ACTOR_ID has a corresponding FILM_ID. So the same actor participates in N different movies.
I want to select the actors that participate 20 movies:
SELECT actor_id FROM film_actor GROUP BY actor_id HAVING COUNT(film_id) = 20;
This query works and returns the actor_ids that participate in 20 movies. But what if I just did:
SELECT actor_id FROM film_actor HAVING COUNT(film_id) = 20;
Why does this query only returns values if I equal it to the SIZE of film_actor table (5463):
SELECT actor_id FROM film_actor HAVING COUNT(film_id) = 5463;
In this case it returns me actor_id = 1. Why? Is it selecting film_ids without considering the corresponding actor_ids?
Upvotes: 3
Views: 499
Reputation: 108400
There's no GROUP BY
in the second query. The usage of the aggregate function COUNT
in the HAVING
clause means the query is going to return at most one row.
Compare to this query:
SELECT actor_id, COUNT(film_id) FROM film_actor
Which returns a single row, e.g.
actor_id COUNT(film_id)
-------- --------------
42 5463
(NOTE: By default, MySQL will return a result for that query. Other databases would reject that query, and raise an "non aggregate not in group by" type error. The issue is that the reference to actor_id
in the SELECT
list. To get that query to work in other databases, we would need to remove actor_id
from the SELECT list. We can get MySQL to behave the same way, if we set sql_mode
to include ONLY_FULL_GROUP_BY
.)
Note that the value returned for actor_id
is a value from "some row". It's not deterministic which row that value is returned from, it can be from any row. The value returned for COUNT
is from the entire table.
If you want the COUNT
for each actor, you need a GROUP BY
clause, as in your first query.
SELECT actor_id, COUNT(film_id) FROM film_actor GROUP BY actor_id
Starting with that query as a basis, you can add a HAVING
clause. And you can also remove the COUNT(film_id)
from the SELECT
list. But you can't remove the GROUP BY
, without affecting what gets returned for COUNT(film_id)
.
Upvotes: 4
Reputation: 30893
So lets say you had:
+---------------------------------+
| actor_id | actor_name | film_id |
+---------------------------------+
| 4 | John | 3 |
| 4 | John | 4 |
| 5 | Alex | 3 |
+---------------------------------+
When you run:
SELECT actor_id, COUNT(film_id) AS Films FROM film_actor GROUP BY actor_id;
We would get:
+------------------+
| actor_id | Films |
+------------------+
| 4 | 2 |
| 5 | 1 |
+------------------+
So, we can do:
SELECT actor_id, COUNT(film_id) AS Films FROM film_actor GROUP BY actor_id WHERE Films = 2;
This should just return actor_id of 4.
Upvotes: 3
Reputation: 21513
GROUP BY groups the results by the value of the following columns, normally used with aggregate functions (eg, COUNT).
So your first query returns one row for each value of actor_id, and the HAVING limits the results to those where the count is 20
Without a GROUP BY clause the aggregate function acts on all rows. So your 2nd query is selecting the actor_id where the count of films is 20, but without grouping the count is 5463 (ie, number of rows on the table). The actor_id returned in this situation is not determined (ie, could be any of them).
Upvotes: 5