Reputation: 1769
Using MySQL, my database looks like this:
+-----------------------+--------------------+------+--------+
| moviesInfo | | | |
+-----------------------+--------------------+------+--------+
| title | director | year | oscars |
+-----------------------+--------------------+------+--------+
| BoBoiBoy: The Movie | Nizam Razak | 2016 | 0 |
| Inception | Christopher Nolan | 2010 | 4 |
| Interstellar | Christopher Nolan | 2014 | 1 |
| Munna Bhai M.B.B.S. | Rajkumar Hirani | 2003 | 0 |
| My Dear Brother | Ertem Egilmez | 1973 | 0 |
| Rocky | John G. Avildsen | 1976 | 3 |
| The Nights of Cabiria | Federico Fellini | 1957 | 1 |
| The Sixth Sense | M. Night Shyamalan | 1999 | 6 |
| Tokyo Story | Yasujirô Ozu | 1953 | 0 |
| Yojimbo | Akira Kurosawa | 1961 | 1 |
+-----------------------+--------------------+------+--------+
select director from moviesInfo
where MAX(year) > 2000 and SUM(oscars) > 2
group by director;
With the above query I try to get the director who has a movie later than the year 2000 and has more than 2 oscars, but I get an error
Invalid use of group function
Being still new to MySQL I don't fully understand this.
Expected output would be
+-------------------+
| director |
+-------------------+
| Christopher Nolan |
+-------------------+
Upvotes: 0
Views: 45
Reputation: 72425
The GROUP BY
aggregate functions cannot be used in the WHERE
clause. They should stay in the HAVING
clause, after the GROUP BY
because they operate on the sets of values generated by grouping.
Your query should be:
SELECT director
FROM moviesInfo
GROUP BY director
HAVING MAX(year) > 2000 AND SUM(oscars) > 2
On OP's request, this is, roughly, how the above SQL query is executed:
The GROUP BY
clause tell the server to group the matching rows (all rows
because there is no WHERE
clause to filter them). The groups are as follows:
+-----------------------+--------------------+------+--------+
| title | director | year | oscars |
+-----------------------+--------------------+------+--------+
| BoBoiBoy: The Movie | Nizam Razak | 2016 | 0 |
+-----------------------+--------------------+------+--------+
| Inception | Christopher Nolan | 2010 | 4 |
| Interstellar | Christopher Nolan | 2014 | 1 |
+-----------------------+--------------------+------+--------+
| Munna Bhai M.B.B.S. | Rajkumar Hirani | 2003 | 0 |
+-----------------------+--------------------+------+--------+
| My Dear Brother | Ertem Egilmez | 1973 | 0 |
+-----------------------+--------------------+------+--------+
| Rocky | John G. Avildsen | 1976 | 3 |
+-----------------------+--------------------+------+--------+
| The Nights of Cabiria | Federico Fellini | 1957 | 1 |
+-----------------------+--------------------+------+--------+
| The Sixth Sense | M. Night Shyamalan | 1999 | 6 |
+-----------------------+--------------------+------+--------+
| Tokyo Story | Yasujirô Ozu | 1953 | 0 |
+-----------------------+--------------------+------+--------+
| Yojimbo | Akira Kurosawa | 1961 | 1 |
+-----------------------+--------------------+------+--------+
The HAVING
clause makes it compute MAX(year)
and SUM(oscars)
for each group. The data now look like:
+-----------------------+--------------------+------+--------+-----------+-------------+
| title | director | year | oscars | MAX(year) | SUM(oscars) |
+-----------------------+--------------------+------+--------+-----------+-------------+
| BoBoiBoy: The Movie | Nizam Razak | 2016 | 0 | 2016 | 0 |
+-----------------------+--------------------+------+--------+-----------+-------------+
| Inception | Christopher Nolan | 2010 | 4 | 2014 | 5 |
| Interstellar | Christopher Nolan | 2014 | 1 | | |
+-----------------------+--------------------+------+--------+-----------+-------------+
| Munna Bhai M.B.B.S. | Rajkumar Hirani | 2003 | 0 | 2003 | 0 |
+-----------------------+--------------------+------+--------+-----------+-------------+
| My Dear Brother | Ertem Egilmez | 1973 | 0 | 1973 | 0 |
+-----------------------+--------------------+------+--------+-----------+-------------+
| Rocky | John G. Avildsen | 1976 | 3 | 1976 | 3 |
+-----------------------+--------------------+------+--------+-----------+-------------+
| The Nights of Cabiria | Federico Fellini | 1957 | 1 | 1957 | 1 |
+-----------------------+--------------------+------+--------+-----------+-------------+
| The Sixth Sense | M. Night Shyamalan | 1999 | 6 | 1999 | 6 |
+-----------------------+--------------------+------+--------+-----------+-------------+
| Tokyo Story | Yasujirô Ozu | 1953 | 0 | 1953 | 0 |
+-----------------------+--------------------+------+--------+-----------+-------------+
| Yojimbo | Akira Kurosawa | 1961 | 1 | 1961 | 1 |
+-----------------------+--------------------+------+--------+-----------+-------------+
Next, the conditions on HAVING
are evaluated and the data set looks like:
+-----------------------+--------------------+------+--------+-----------+-------------+
| title | director | year | oscars | MAX(year) | SUM(oscars) |
+-----------------------+--------------------+------+--------+-----------+-------------+
| Inception | Christopher Nolan | 2010 | 4 | 2014 | 5 |
| Interstellar | Christopher Nolan | 2014 | 1 | | |
+-----------------------+--------------------+------+--------+-----------+-------------+
Now the SELECT
clause is used to generate the result set you expect.
Please note that the explanation above is the theoretical model behind the GROUP BY
queries. In the real world, MySQL (and the other RDBMSes) mixes and runs the above steps in a different order. It combines the WHERE
and HAVING
clauses (if possible), processes one row at a time and tries to filter out as many rows as possible as soon as possible, in order to finish the processing as quickly as possible.
The SELECT
clause on a GROUP BY
query can contain only expressions that:
GROUP BY
clause;director
;GROUP BY
aggregate functions;SUM(oscars)
, COUNT(*)
, AVG(year)
etc. There is no requirement for them to be present in other clauses;GROUP BY
clause;GROUP BY
clause contains a column that is the PK
of a table, any column of that table is allowed in the SELECT
clause because all the rows present in a group will have the same value for all columns of that table.Let's remark that the columns title
, year
and oscars
cannot be in the SELECT
clause because the do not match any of the above conditions. Adding them to the SELECT
clause produces invalid SQL. The reason is obvious: there are two distinct values for title
in the second group. Which one to choose?
Before version 5.7.5, MySQL used to accept such invalid SQL queries but it reserved its right to return indeterminate results. This means the result can change if the query is executed against the same data set on a different machine (or after the data is restored from backup).
Since version 5.7.5, MySQL rejects invalid GROUP BY
queries.
Upvotes: 3