Pavel Orekhov
Pavel Orekhov

Reputation: 2203

How do shuffle hash join and sort merge join work exactly?

I can explain how broadcast join works and this article explains it well: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-joins-broadcast.html

But I have failed to find an article that explains the inner workings of shuffle hash join and sort merge join.

Can anyone please give the step by step algorithm for those 2?

Upvotes: 8

Views: 18546

Answers (1)

Alon
Alon

Reputation: 11945

Here is a good material:

Shuffle Hash Join

Sort Merge Join

Notice that since Spark 2.3 the default value of spark.sql.join.preferSortMergeJoin has been changed to true.

Upvotes: 12

Related Questions