Mark
Mark

Reputation: 8441

Do MAX() improves performance of a query?

In this post:SQL Query to get the data.

the first answer is:

SELECT students.student_id,student_name,father_name,mother_name,
           COUNT(student_addresses.student_id) AS total_addresses,    
           COUNT(student_phones.student_id) AS total_phones
     FROM students,student_phones,student_addresses
     WHERE students.student_id = student_phones.student_id AND
           students.student_id = student_addresses.student_id AND
           students.student_id = 7
    GROUP BY BY students.student_id,student_name,father_name,mother_name;

while the 2nd is:

SELECT s.student_id,
       max(s.student_name) student_name,
       max(s.father_name) father_name,
       max(s.mother_name) mother_name,
       COUNT(distinct a.student_address_id) total_addresses,    
       COUNT(distinct p.student_phone_id) total_phones
FROM students s
LEFT JOIN student_phones p ON s.student_id = p.student_id
LEFT JOIN student_addresses a ON s.student_id = a.student_id
WHERE s.student_id = 7
GROUP BY s.student_id

Now, the question: are there any significant differences between the two query when it comes to performance? Does the use of MAX() affects the execution time for the 2nd query?

I try to google for answer but no luck. I want a clear and specific explanation for this one.

Upvotes: 0

Views: 83

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270593

The two queries are not doing the same thing, even when the four columns are all unique (students.student_id, student_name, father_name, mother_name).

From a logic perspective the two queries are not the same. The first will return no rows for a student that has either no phones or no addresses. The second will return such students. Also, the count values are different (depending on the data).

From a performance perspective, the major difference is:

       COUNT(student_addresses.student_id) AS total_addresses,    
       COUNT(student_phones.student_id) AS total_phones

versus:

       COUNT(distinct student_addresses.student_id) AS total_addresses,    
       COUNT(distinct student_phones.student_id) AS total_phones

Using count(distinct) is more expensive, because the SQL engine has to maintain lists of all values. In extreme cases, these values may exceed memory and even result in more I/O operations. For a count(), the engine just adds one to a number instead of doing fiddly list operations.

Similarly, the overhead of min() and max() is minimal -- the engine does a comparison and overwrite a value. This is a small iota of extra work that is not likely to affect performance. Balancing this out is the fact that the group by key is shorter. Shorter keys can have an effect on performance, depending on the underlying algorithm being used. Regardless, both queries have the same amount of data being processed by the group by, so the overall difference in key length (regardless of algorithm) is likely to be minimal.

In short, any difference in performance is due to the count(distinct) and not to the max(). You should decide if that is what you really need and write the query according. The second form is better because it uses ANSI standard join syntax.

Upvotes: 1

Related Questions