Reputation: 783
In spark-sql I have a query that uses several tables (both large & small) in Joins. My question is - does the order of these tables matter with respect to query performance ?
For e.g.
select larger.col1, smaller.col2
from larger
Left Join smaller
on larger.key = smaller.key
Left Join larger2
on larger2.key2 = smaller.key2
I have searched online but didn't get a conclusive answer. So, will the performance vary if I change the order of the left vs right tables ?
Upvotes: 2
Views: 1208
Reputation: 783
The join order seems to be changed for optimization by Spark.
There could be :
The following appears to shed some light on this topic:
https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer-star-schema/read https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer/read https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer-cost-based-optimization/read
Upvotes: 2