Reputation: 1659
We have a query that is currently killing our database and I know there has to be a way to optimize it. We have 3 tables:
We've been using the following query to display a simple HTML table that shows each list and a number of attributes related to the list including averages of attributes of the included list items:
select object_id, user_id, slug, title, description, items,
city, state, country, created, updated,
(select AVG(rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_rating',
(select AVG(avg_rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_avg_rating',
(select AVG(length) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_length',
(select AVG(difficulty_rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_difficulty'
from lists
where user_id=$user_id AND status="A"
order by $orderby LIMIT $start,$step
The reason why we haven't broken this up in 1 query to get all the lists and subsequent lookups to pull the averages for each list is because we want the user to be able to sort on the averages columns (i.e. 'order by avg_difficulty').
Hopefully my explanation makes sense. There has to be a much more efficient way to do this and I'm hoping that a MySQL guru out there can point me in the right direction. Thanks!
Upvotes: 1
Views: 270
Reputation: 4158
Besides indexing, even a cursory analysis shows that your query contains much redundancy that your DBMS' optimizer cannot be able to spot (SQL is a redundant language, it admits too many equivalents, syntactically different expressions; this is a known and documented problem - see for example SQL redundancy and DBMS performance, by Fabian Pascal).
I will rewrite your query, below, to highlight that:
let LI =
select object_id from list_items where list_id=lists.object_id
in
select object_id, user_id, slug, title, description, items, city, state, country, created, updated,
(select AVG(rating) from items where object_id IN LI AND status="A") as 'avg_rating',
(select AVG(avg_rating) from items where object_id IN LI AND status="A") as 'avg_avg_rating',
(select AVG(length) from items where object_id IN LI AND status="A") as 'avg_length',
(select AVG(difficulty_rating) from items where object_id IN LI AND status="A") as 'avg_difficulty'
from lists
where user_id=$user_id AND status="A"
order by $orderby
LIMIT $start, $step
Note: this is only the first step to refactor that beast.
I wonder: why people rarely - if at all - use views, even only to simplify SQL queries? It will help in writing more manageable and refactorable queries.
Upvotes: 0
Reputation: 238296
It looks like you can replace all the subqueries with joins:
SELECT l.object_id,
l.user_id,
<other columns from lists>
AVG(i.rating) as avgrating,
AVG(i.avg_rating) as avgavgrating,
<other averages>
FROM lists l
LEFT JOIN list_items li
ON li.list_id = l.object_id
LEFT JOIN items i
ON i.object_id = li.object_id
AND i.status = 'A'
WHERE l.user_id = $user_id AND l.status = 'A'
GROUP BY l.object_id, l.user_id, <other columns from lists>
That would save a lot of work for the DB engine.
Upvotes: 6
Reputation: 28194
A couple of things to consider:
Make sure that all of your joins are indexed on both sides. For example, you join list_items.list_id=lists.object_id in several places. list_id and object_id should both have indexes on them.
Have you done any research as to what the variation in the averages are? You might benefit from having a worker thread (or cronjob) calculate the averages periodically rather than putting the load on your RDBMS every time you run this query. You'd need to store the averages in a separate table of course...
Also, are you using status as an enum or a varchar? The cardinality of an enum would be much lower; consider switching to this type if you have a limited range of values for status column.
-aj
Upvotes: 1
Reputation: 11039
Here how to find the bottleneck:
Add the keyword EXPLAIN before the SELECT. This will cause the engine to output how the SELECT was performed.
To learn more about Query Optimization with this method see: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
Upvotes: 1
Reputation: 20073
That's one hell of a query... you should probably edit your question and change the query so it's a bit more readable, although due to the complex nature of it, I'm not sure that's possible.
Anyway, the simple answer here is to denormalize your database a bit and cache all of your averages on the list table itself in indexed decimal columns. All those sub queries are killing you.
The hard part, and what you'll have to figure out is how to keep those averages updated. A generally easy way is to store the count of all items and the sum of all those values in two separate fields. Anytime an action is made, increment the count by 1, and the sum by whatever. Then update table avg_field = sum_field/count_field.
Upvotes: 0