Ludisposed
Ludisposed

Reputation: 1769

"Invalid use of group function" in MySQL

Using MySQL, my database looks like this:

+-----------------------+--------------------+------+--------+
|      moviesInfo       |                    |      |        |
+-----------------------+--------------------+------+--------+
| title                 | director           | year | oscars |
+-----------------------+--------------------+------+--------+
| BoBoiBoy: The Movie   | Nizam Razak        | 2016 | 0      |
| Inception             | Christopher Nolan  | 2010 | 4      |
| Interstellar          | Christopher Nolan  | 2014 | 1      |
| Munna Bhai M.B.B.S.   | Rajkumar Hirani    | 2003 | 0      |
| My Dear Brother       | Ertem Egilmez      | 1973 | 0      |
| Rocky                 | John G. Avildsen   | 1976 | 3      |
| The Nights of Cabiria | Federico Fellini   | 1957 | 1      |
| The Sixth Sense       | M. Night Shyamalan | 1999 | 6      |
| Tokyo Story           | Yasujirô Ozu       | 1953 | 0      |
| Yojimbo               | Akira Kurosawa     | 1961 | 1      |
+-----------------------+--------------------+------+--------+


select director from moviesInfo
where MAX(year) > 2000 and SUM(oscars) > 2
group by director;

With the above query I try to get the director who has a movie later than the year 2000 and has more than 2 oscars, but I get an error

Invalid use of group function

Being still new to MySQL I don't fully understand this.

Expected output would be

+-------------------+
|     director      |
+-------------------+
| Christopher Nolan |
+-------------------+

Upvotes: 0

Views: 45

Answers (1)

axiac
axiac

Reputation: 72425

The GROUP BY aggregate functions cannot be used in the WHERE clause. They should stay in the HAVING clause, after the GROUP BY because they operate on the sets of values generated by grouping.

Your query should be:

SELECT director 
FROM moviesInfo
GROUP BY director
HAVING MAX(year) > 2000 AND SUM(oscars) > 2

Update

On OP's request, this is, roughly, how the above SQL query is executed:

  1. The GROUP BY clause tell the server to group the matching rows (all rows because there is no WHERE clause to filter them). The groups are as follows:

    +-----------------------+--------------------+------+--------+
    | title                 | director           | year | oscars |
    +-----------------------+--------------------+------+--------+
    | BoBoiBoy: The Movie   | Nizam Razak        | 2016 | 0      |
    +-----------------------+--------------------+------+--------+
    | Inception             | Christopher Nolan  | 2010 | 4      |
    | Interstellar          | Christopher Nolan  | 2014 | 1      |
    +-----------------------+--------------------+------+--------+
    | Munna Bhai M.B.B.S.   | Rajkumar Hirani    | 2003 | 0      |
    +-----------------------+--------------------+------+--------+
    | My Dear Brother       | Ertem Egilmez      | 1973 | 0      |
    +-----------------------+--------------------+------+--------+
    | Rocky                 | John G. Avildsen   | 1976 | 3      |
    +-----------------------+--------------------+------+--------+
    | The Nights of Cabiria | Federico Fellini   | 1957 | 1      |
    +-----------------------+--------------------+------+--------+
    | The Sixth Sense       | M. Night Shyamalan | 1999 | 6      |
    +-----------------------+--------------------+------+--------+
    | Tokyo Story           | Yasujirô Ozu       | 1953 | 0      |
    +-----------------------+--------------------+------+--------+
    | Yojimbo               | Akira Kurosawa     | 1961 | 1      |
    +-----------------------+--------------------+------+--------+
    
  2. The HAVING clause makes it compute MAX(year) and SUM(oscars) for each group. The data now look like:

    +-----------------------+--------------------+------+--------+-----------+-------------+
    | title                 | director           | year | oscars | MAX(year) | SUM(oscars) |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    | BoBoiBoy: The Movie   | Nizam Razak        | 2016 | 0      | 2016      | 0           |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    | Inception             | Christopher Nolan  | 2010 | 4      | 2014      | 5           |
    | Interstellar          | Christopher Nolan  | 2014 | 1      |           |             |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    | Munna Bhai M.B.B.S.   | Rajkumar Hirani    | 2003 | 0      | 2003      | 0           |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    | My Dear Brother       | Ertem Egilmez      | 1973 | 0      | 1973      | 0           |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    | Rocky                 | John G. Avildsen   | 1976 | 3      | 1976      | 3           |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    | The Nights of Cabiria | Federico Fellini   | 1957 | 1      | 1957      | 1           |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    | The Sixth Sense       | M. Night Shyamalan | 1999 | 6      | 1999      | 6           |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    | Tokyo Story           | Yasujirô Ozu       | 1953 | 0      | 1953      | 0           |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    | Yojimbo               | Akira Kurosawa     | 1961 | 1      | 1961      | 1           |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    
  3. Next, the conditions on HAVING are evaluated and the data set looks like:

    +-----------------------+--------------------+------+--------+-----------+-------------+
    | title                 | director           | year | oscars | MAX(year) | SUM(oscars) |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    | Inception             | Christopher Nolan  | 2010 | 4      | 2014      | 5           |
    | Interstellar          | Christopher Nolan  | 2014 | 1      |           |             |
    +-----------------------+--------------------+------+--------+-----------+-------------+
    
  4. Now the SELECT clause is used to generate the result set you expect.

Please note that the explanation above is the theoretical model behind the GROUP BY queries. In the real world, MySQL (and the other RDBMSes) mixes and runs the above steps in a different order. It combines the WHERE and HAVING clauses (if possible), processes one row at a time and tries to filter out as many rows as possible as soon as possible, in order to finish the processing as quickly as possible.

Remarks

The SELECT clause on a GROUP BY query can contain only expressions that:

  1. are also present in the GROUP BY clause;
    for example, director;
  2. are GROUP BY aggregate functions;
    for example SUM(oscars), COUNT(*), AVG(year) etc. There is no requirement for them to be present in other clauses;
  3. are functionally dependent on columns that are present in the GROUP BY clause;
    for example, if the GROUP BY clause contains a column that is the PK of a table, any column of that table is allowed in the SELECT clause because all the rows present in a group will have the same value for all columns of that table.

Let's remark that the columns title, year and oscars cannot be in the SELECT clause because the do not match any of the above conditions. Adding them to the SELECT clause produces invalid SQL. The reason is obvious: there are two distinct values for title in the second group. Which one to choose?

Before version 5.7.5, MySQL used to accept such invalid SQL queries but it reserved its right to return indeterminate results. This means the result can change if the query is executed against the same data set on a different machine (or after the data is restored from backup).

Since version 5.7.5, MySQL rejects invalid GROUP BY queries.

Upvotes: 3

Related Questions