Reputation: 91
I am learning about databases at college, and have the assignment about finding the minimum avg exam grade for a college course. I have made two solutions, but I hope you experts in here can help me with:
What is the best/most effective solution?
Solution 1:
SELECT courses.name , MIN(avg_grade)
FROM (SELECT courseCode, AVG(grade) as avg_grade
FROM exams
GROUP BY courseCode) avg_grades, courses
WHERE courses.code = avg_grades.courseCode
Solution 2:
SELECT name, min(avg_grade)
FROM (SELECT courses.name, AVG(grade) as avg_grade
FROM courses
LEFT JOIN exams on exams.courseCode = courses.code
GROUP BY courseCode) mytable
And I have been thinking about if JOIN or LEFT JOIN is the correct to use here?
Upvotes: 2
Views: 134
Reputation: 69819
Your two queries are different, so you can't really compare efficiency, your second query will return records for courses with no exam results. Assuming that you switch the LEFT JOIN to an INNER to make the queries comparable, then I would expect the first query to be slightly more efficient since it only has one derived table, and the second has two:
Solution 1:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS FILTERED EXTRA
1 PRIMARY ALL 5 100
1 PRIMARY courses ALL 5 100 Using where; Using join buffer
2 DERIVED exams ALL 5 100 Using temporary; Using filesort
Solution 2:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS FILTERED EXTRA
1 PRIMARY ALL 5 100
2 DERIVED courses ALL 5 100 Using temporary; Using filesort
2 DERIVED exams ALL 5 100 Using where; Using join buffer
I would however check this against your own execution plans as mine was just a quick example on SQL Fiddle.
I would like to take this chance to advise against using the ANSI-89 implicit join syntax, it was replaced over 20 years ago by the explicit join syntax in the ANSI-92 standard. Aaron Bertrand has written a great article on why to switch, I won't duplicate it here.
Another, much more important point though is that your queries are not deterministic, that is to say you could run the same query twice and get 2 different results even with no underlying change in the data.
Taking your second query as an example (although you will notice both queries are wrong on the SQL-Fiddle), you have a subquery MyTable
like so:
SELECT courses.name, AVG(grade) as avg_grade
FROM courses
LEFT JOIN exams on exams.courseCode = courses.code
GROUP BY courseCode
This returned a table like so:
Name | avg_grade
--------+--------------
A | 10
B | 5
C | 6
D | 7
E | 2
You may expect the query as a whole to return:
Name | avg_grade
--------+--------------
E | 2
Since 2 is the lowest average grade, and E is the name that corresponds with that. You would be wrong though, as demonstrated here you can see this actually returns:
Name | avg_grade
--------+--------------
A | 2
What is essentially happening is that MySQL is calculating the minimum avg_grade correctly, but since you have not added any columns to the group by you have given MySQL Carte blanche to choose any value for Name
it chooses.
To get the output you want, I think you need:
SELECT courses.name , MIN(avg_grade)
FROM ( SELECT courseCode, AVG(grade) as avg_grade
FROM exams
GROUP BY courseCode
) avg_grades
INNER JOIN courses
ON courses.code = avg_grades.courseCode
GROUP BY courses.Name;
Or if you only want to the course with the lowest average grade then use:
SELECT courseCode, AVG(grade) as avg_grade
FROM exams
GROUP BY courseCode
ORDER BY avg_grade
LIMIT 1;
Please excuse the laziness of what I am about to do, but I have explained this problem a lot before, and now have a standard response that I post to explain the issue of MySQL grouping. It goes into more detail than the above, and hopefully explains it further.
MySQL Implicit Grouping
I would advise to avoid the implicit grouping offered by MySQL where possible, by this i mean including columns in the select list, even though they are not contained in an aggregate function or the group by clause.
Imagine the following simple table (T):
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
2 | A | Y |
In MySQL you can write
SELECT ID, Column1, Column2
FROM T
GROUP BY Column1;
This actually breaks the SQL Standard, but it works in MySQL, however the trouble is it is non-deterministic, the result:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
Is no more or less correct than
ID | Column1 | Column2 |
----|---------+----------|
2 | A | Y |
So what you are saying is give me one row for each distinct value of Column1
, which both results sets satisfy, so how do you know which one you will get? Well you don't, it seems to be a fairly popular misconception that you can add and ORDER BY
clause to influence the results, so for example the following query:
SELECT ID, Column1, Column2
FROM T
GROUP BY Column1
ORDER BY ID DESC;
Would ensure that you get the following result:
ID | Column1 | Column2 |
----|---------+----------|
2 | A | Y |
because of the ORDER BY ID DESC
, however this is not true (as demonstrated here).
The MySQL documents state:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.
So even though you have an order by this does not apply until after one row per group has been selected, and this one row is non-deterministic.
The SQL-Standard does allow columns in the select list not contained in the GROUP BY or an aggregate function, however these columns must be functionally dependent on a column in the GROUP BY. For example, ID in the sample table is the PRIMARY KEY, so we know it is unique in the table, so the following query conforms to the SQL standard and would run in MySQL and fail in many DBMS currently (At the time of writing Postgresql is the closest DBMS I know of to correctly implementing the standard):
SELECT ID, Column1, Column2
FROM T
GROUP BY ID;
Since ID is unique for each row, there can only be one value of Column1
for each ID, one value of Column2
there is no ambiguity about what to return for each row.
EDIT
From the SQL-2003-Standard (5WD-02-Foundation-2003-09 - page 346) - http://www.wiscorp.com/sql_2003_standard.zip
- If T is a grouped table, then let G be the set of grouping columns of T. In each contained in , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a whose aggregation query is QS.
Upvotes: 5