Reputation: 2605
In the Spark source code for join strategies, code comments mention for Broadcast hash join (BHJ):
BHJ is not supported for full outer join. For right outer join, we only can broadcast the left side. For left outer, left semi, left anti and the internal join type ExistenceJoin, we only can broadcast the right side. For inner like join, we can broadcast both sides. Normally, BHJ can perform faster than the other join algorithms when the broadcast side is small. However, broadcasting tables is a network-intensive operation. It could cause OOM or perform worse than the other join algorithms, especially when the build/broadcast side is big.
Could you please explain what does the code comments mean by
"inner-like join"
Upvotes: 1
Views: 233
Reputation: 2605
Finally found in the code: joinTypes.scala
InnerLike includes: Inner and Cross joins.
sealed abstract class InnerLike extends JoinType {
def explicitCartesian: Boolean
}
case object Inner extends InnerLike {
override def explicitCartesian: Boolean = false
override def sql: String = "INNER"
}
case object Cross extends InnerLike {
override def explicitCartesian: Boolean = true
override def sql: String = "CROSS"
}
Upvotes: 1
Reputation: 8374
according to a doc for the dataset join operators innerlike is used for INNER and CROSS joins.
You can also find that Spark SQL uses the following two families of joins:
- InnerLike with Inner and Cross
- LeftExistence with LeftSemi, LeftAnti and ExistenceJoin
Upvotes: 3