Frederik Muller
Frederik Muller

Reputation: 88

How to make subquery fast

for an author overview we are looking for a query which will show all the authors including their best book. The problem with this query is that it lacks speed. There are only about 1500 authors and the query do generate the overview is currently taking 20 seconds.

The main problem seems te be generating the average rating of all the books per person. By selecting the following query, it is still rather fast

select
    person.id as pers_id,
    person.firstname,
    person.suffix,
person.lastname,
    thriller.title,
    year(thriller.orig_pubdate) as year,
    thriller.id as thrill_id,
    count(user_rating.id) as nr,
    AVG(user_rating.rating) as avgrating
from 
    thriller 
inner join 
    thriller_form 
    on thriller_form.thriller_id = thriller.id
inner join 
    thriller_person 
    on thriller_person.thriller_id = thriller.id 
    and thriller_person.person_type_id = 1 
inner join 
    person 
    on person.id = thriller_person.person_id
left outer join
    user_rating
    on user_rating.thriller_id = thriller.id 
    and user_rating.rating_type_id = 1
where thriller.id in
    (select top 1 B.id from thriller as B
    inner join thriller_person as C on B.id=C.thriller_id
    and person.id=C.person_id)
group by
    person.firstname,
    person.suffix,
    person.lastname,
    thriller.title,
    year(thriller.orig_pubdate),
    thriller.id,
    person.id
order by
    person.lastname

However, if we make the subquery a little more complex by selecting the book with the average rating it takes a full 20 seconds to generate a resultset. The query would then be as follows:

select
    person.id as pers_id,
    person.firstname,
    person.suffix,
    person.lastname,
    thriller.title,
    year(thriller.orig_pubdate) as year,
    thriller.id as thrill_id,
    count(user_rating.id) as nr,
    AVG(user_rating.rating) as avgrating
from 
    thriller 
inner join 
    thriller_form 
    on thriller_form.thriller_id = thriller.id
inner join 
    thriller_person 
    on thriller_person.thriller_id = thriller.id 
    and thriller_person.person_type_id = 1 
inner join 
    person 
    on person.id = thriller_person.person_id
left outer join
    user_rating
    on user_rating.thriller_id = thriller.id 
    and user_rating.rating_type_id = 1
where thriller.id in
    (select top 1 B.id from thriller as B
    inner join thriller_person as C on B.id=C.thriller_id
    and person.id=C.person_id
    inner join user_rating as D on B.id=D.thriller_id
    group by B.id
    order by AVG(D.rating))
group by
    person.firstname,
    person.suffix,
    person.lastname,
    thriller.title,
    year(thriller.orig_pubdate),
    thriller.id,
    person.id
    order by
    person.lastname

Anyone got a good suggestion to speed up this query?

Upvotes: 2

Views: 237

Answers (1)

Donal Fellows
Donal Fellows

Reputation: 137707

Calculating an average requires a table scan since you've got to sum the values and then divide by the number of (relevant) rows. This in turn means that you're doing a lot of rescanning; that's slow. Can you calculate the averages once and store them? That would let your query use those pre-computed values. (Yes, it denormalizes the data, but denormalizing for performance is often necessary; there's a trade-off between performance and minimal data.)

It might be appropriate to use a temporary table as the store of the averages.

Upvotes: 2

Related Questions