Tobias Baumeister
Tobias Baumeister

Reputation: 2147

Improving a query/table: Long "Copying to tmp table"

I have the following query in mysql:

  SELECT t.ID
    FROM forum_categories c, forum_threads t 
    INNER JOIN forum_posts p ON p.ID = t.Last_post 
    WHERE t.ForumID=36 OR (c.Parent=36 AND t.ForumID=c.ID) 
ORDER BY t.Last_post DESC LIMIT 1

The table forum_threads looks like this:
ID --- Title --- ForumID -- Last_post (ID of Last forum Post)

And the table forum_posts like this:
ID --- Content -- Author

And lastly the table forum_categories like this:
ID -- Name --- Parent (Another forum_categoriey)

(both simplified)
The table forum_posts contains currently ~ 200,000 rows and the table forum_threads ~ 5,000 rows

Somehow these queries take about 1-2 seconds sometimes.
I already indexed "Last_post", but it doesn't help.
The "Copying to tmp table" duration makes ~ 99% of the whole execution time of this query

I also increased the tmp_table_size and the sort_buffer_size but it still makes no difference.

Any ideas?

Upvotes: 0

Views: 363

Answers (1)

Abhik Chakraborty
Abhik Chakraborty

Reputation: 44864

The query should be much better when you have something as

select t.id 
from forum_threads t 
inner join forum_posts p ON p.ID = t.Last_post 
inner join forum_categories c on t.ForumID=c.ID 
WHERE t.ForumID=36 OR c.Parent=36 
ORDER BY t.Last_post 
DESC LIMIT 1

Now for small set of data it will look very nice and the query time will be really good.

So the next thing how to improve it for large set of data and the answer is INDEX.

  • There are 2 joins happening

  • There is a where clause as well

So you will need to index the table properly to avoid full table scan.

You can run the following command to see the current indexes on the tables as

show indexes from forum_threads;
show indexes from forum_posts ;
show indexes from forum_categories ;

The above commands will show you the indexes associated with the tables. Now consider the fact that there is no index so we will need to do the indexing as

alter table forum_threads add index Last_post_idx (`Last_post`);
alter table forum_posts add index ID_idx (`ID`);
alter table forum_categories add index ID_idx (`ID`);

and finally

alter table forum_threads add index ForumID_idx (`ForumID`);
alter table forum_categories add index Parent_idx (`Parent`);

Now we have indexes on the tables and query should be way faster.

NOTE : The joining keys between 2 tables should have identical data type and size so that the indexes works. For example

inner join forum_posts p ON p.ID = t.Last_post  

the ID and Last_post should be having same data type and size in the tables.

Now we still have an issue on the query it uses OR condition and even with the proper index the query will try to scan the full table in some cases.

WHERE t.ForumID=36 OR c.Parent=36 

So how to get rid of this, well sometime UNION works better in this cases. Meaning you run one query with a condtion

WHERE t.ForumID=36

followed by UNION same query with a different where condition as

WHERE c.Parent=36

But the optimization needs more insight on the tables and the possible queries that are going to be executed on those tables.

The explanation above is just an idea how we can improve the performance of the query and there are many possibilities in real time and these could be handled while having the complete table structures and the possible queries that are going to be applied on them.

Upvotes: 1

Related Questions