Pedro Barros
Pedro Barros

Reputation: 183

How does this mySQL Query works?

I'm trying to understand how mysql queries work with and without GROUP BY.

Imagine I have a table of FILM_ACTORS, where each ACTOR_ID has a corresponding FILM_ID. So the same actor participates in N different movies.

I want to select the actors that participate 20 movies:

SELECT actor_id FROM film_actor GROUP BY actor_id HAVING COUNT(film_id) = 20;

This query works and returns the actor_ids that participate in 20 movies. But what if I just did:

SELECT actor_id FROM film_actor HAVING COUNT(film_id) = 20;

Why does this query only returns values if I equal it to the SIZE of film_actor table (5463):

SELECT actor_id FROM film_actor HAVING COUNT(film_id) = 5463;

In this case it returns me actor_id = 1. Why? Is it selecting film_ids without considering the corresponding actor_ids?

Upvotes: 3

Views: 499

Answers (3)

spencer7593
spencer7593

Reputation: 108400

There's no GROUP BY in the second query. The usage of the aggregate function COUNT in the HAVING clause means the query is going to return at most one row.

Compare to this query:

SELECT actor_id, COUNT(film_id) FROM film_actor

Which returns a single row, e.g.

actor_id  COUNT(film_id)
--------  --------------
      42            5463

(NOTE: By default, MySQL will return a result for that query. Other databases would reject that query, and raise an "non aggregate not in group by" type error. The issue is that the reference to actor_id in the SELECT list. To get that query to work in other databases, we would need to remove actor_id from the SELECT list. We can get MySQL to behave the same way, if we set sql_mode to include ONLY_FULL_GROUP_BY.)

Note that the value returned for actor_id is a value from "some row". It's not deterministic which row that value is returned from, it can be from any row. The value returned for COUNT is from the entire table.


If you want the COUNT for each actor, you need a GROUP BY clause, as in your first query.

    SELECT actor_id, COUNT(film_id) FROM film_actor GROUP BY actor_id

Starting with that query as a basis, you can add a HAVING clause. And you can also remove the COUNT(film_id) from the SELECT list. But you can't remove the GROUP BY, without affecting what gets returned for COUNT(film_id).

Upvotes: 4

Twisty
Twisty

Reputation: 30893

So lets say you had:

+---------------------------------+
| actor_id | actor_name | film_id |
+---------------------------------+
|        4 |       John |       3 |
|        4 |       John |       4 |
|        5 |       Alex |       3 |
+---------------------------------+

When you run:

SELECT actor_id, COUNT(film_id) AS Films FROM film_actor GROUP BY actor_id;

We would get:

+------------------+
| actor_id | Films |
+------------------+
|        4 |     2 |
|        5 |     1 |
+------------------+

So, we can do:

SELECT actor_id, COUNT(film_id) AS Films FROM film_actor GROUP BY actor_id WHERE Films = 2;

This should just return actor_id of 4.

Upvotes: 3

Kickstart
Kickstart

Reputation: 21513

GROUP BY groups the results by the value of the following columns, normally used with aggregate functions (eg, COUNT).

So your first query returns one row for each value of actor_id, and the HAVING limits the results to those where the count is 20

Without a GROUP BY clause the aggregate function acts on all rows. So your 2nd query is selecting the actor_id where the count of films is 20, but without grouping the count is 5463 (ie, number of rows on the table). The actor_id returned in this situation is not determined (ie, could be any of them).

Upvotes: 5

Related Questions