marie20
marie20

Reputation: 783

Order of the tables in a JOIN

In spark-sql I have a query that uses several tables (both large & small) in Joins. My question is - does the order of these tables matter with respect to query performance ? For e.g.

select larger.col1, smaller.col2
from larger
Left Join smaller
on larger.key = smaller.key
Left Join larger2
on larger2.key2 = smaller.key2

I have searched online but didn't get a conclusive answer. So, will the performance vary if I change the order of the left vs right tables ?

Upvotes: 2

Views: 1208

Answers (1)

marie20
marie20

Reputation: 783

The join order seems to be changed for optimization by Spark.

There could be :

  • Reorder JOIN optimizer
  • Reorder JOIN optimizer - star schema
  • Reorder JOIN optimizer - cost based optimization

The following appears to shed some light on this topic:

https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer-star-schema/read https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer/read https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer-cost-based-optimization/read

Upvotes: 2

Related Questions