MySQL performance comparison between joining table and derived table

Question

I have this two queries following and noticed they have a huge performance difference

Query1

SELECT count(distinct b.id) FROM tableA as a
    LEFT JOIN tableB as b on a.id = b.aId 
    GROUP BY a.id

Query2

SELECT count(distinct b.id) FROM tableA as a
    LEFT JOIN (SELECT * FROM tableB) as b on a.id = b.aId 
    GROUP BY a.id

the queries are basically joining one table to another and I noticed that Query1 takes about 80ms whereas Query2 takes about 2sec with thousands of data in my system. Could anyone explain me why this happens ? and if it's a wise choice to use only Query2 style whenever I am forced to use it ? or is there a better way to do the same thing but better than Query2 ?

Tim Biegeleisen · Accepted Answer

When you replace tableB with (SELECT * FROM tableB) you are forcing the query engine to materialize a subquery, or intermediate table result. In other words, in the second query, you aren't actually joining directly to tableB, you are joining to some intermediate table. As a result of this, any indices which might have existed on tableB to make the query faster would not be available. Based on your current example, I see no reason to use the second version.

Under certain conditions you might be forced to use the second version though. For example, if you needed to transform tableB in some way, you might need a subquery to do that.

MySQL performance comparison between joining table and derived table

Answers (1)

Related Questions