vinieth
vinieth

Reputation: 1314

MySQL Optimiser - cost planner doesn't know when DuplicateWeedout Strategy creates disk table

This is my sample query

Select table1.id 
from table1 
where table.id in (select table2.id 
                    from table2 
                    where table2.id in (select table3.id 
                                        from table3)
                    ) 
order by table1.id  
limit 100

On checking the optimiser trace for the above query. Optimiser trace cost

  1. DUPLICATE-WEEDOUT strategy - Cost: 1.08e7
  2. FIRST MATCH strategy - Cost: 1.85e7

As DUPLICATE-WEEDOUT cost is less, mysql took DUPLICATE-WEEDOUT strategy for the above query.

Seems everything good in join_optimization part right. But finally, after checking the join_execution part. DUPLICATE-WEEDOUT usually creates temp table. But here as the heap-size is not enough for temp table, it went on creating ondisk temp table(converting_tmp_table_to_ondisk).

Due to disk temp table my query execution became slower.


So what happened here?

Optimiser trace doesn't calculate the cost of disk table in join-optimisation part itself. If disk table cost was calculated, it would be higher than first match. Then final_semijoin_strategy would be FIRST-MATCH strategy, with this my query would have been faster.

Is there any way MYSQL calculate the cost of disk table in join-optimisation part itself or any other work around for this particular issue?

MYSQ-5.7, INNODB


Note: This is a very dynamic query where multiple condition will add based on request in query. So I have done optimising the query in all possible manner. And finally stuck with this disk table cost issue. Kindly avoid optimising the query(like changing the query structure, forcing first-match strategy). And for increasing the heap size(Im not sure much about it, in different forum many said it might bring different issue in other queries)

Upvotes: 7

Views: 289

Answers (1)

Rick James
Rick James

Reputation: 142366

IN( SELECT ... ) has been notoriously inefficient. Try to avoid it.

The query, as presented, is probably equivalent to

SELECT  t1.id
    FROM  t1
    JOIN  t2 USING(id)
    JOIN  t3 USING(id)
    ORDER BY  id
    LIMIT  100

which will optimize nicely.

This formulation should not need to build any temp table, much less a disk-based one.

Upvotes: 0

Related Questions