Reputation: 2203
I can explain how broadcast join works and this article explains it well: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-joins-broadcast.html
But I have failed to find an article that explains the inner workings of shuffle hash join and sort merge join.
Can anyone please give the step by step algorithm for those 2?
Upvotes: 8
Views: 18546
Reputation: 11945
Here is a good material:
Notice that since Spark 2.3 the default value of spark.sql.join.preferSortMergeJoin
has been changed to true
.
Upvotes: 12