Anurag Sharma
Anurag Sharma

Reputation: 2605

What is Inner Like join?

In the Spark source code for join strategies, code comments mention for Broadcast hash join (BHJ):

BHJ is not supported for full outer join. For right outer join, we only can broadcast the left side. For left outer, left semi, left anti and the internal join type ExistenceJoin, we only can broadcast the right side. For inner like join, we can broadcast both sides. Normally, BHJ can perform faster than the other join algorithms when the broadcast side is small. However, broadcasting tables is a network-intensive operation. It could cause OOM or perform worse than the other join algorithms, especially when the build/broadcast side is big.

Could you please explain what does the code comments mean by

"inner-like join"

code link

Upvotes: 1

Views: 233

Answers (2)

Anurag Sharma
Anurag Sharma

Reputation: 2605

Finally found in the code: joinTypes.scala

InnerLike includes: Inner and Cross joins.

sealed abstract class InnerLike extends JoinType {
  def explicitCartesian: Boolean
}

case object Inner extends InnerLike {
  override def explicitCartesian: Boolean = false
  override def sql: String = "INNER"
}

case object Cross extends InnerLike {
  override def explicitCartesian: Boolean = true
  override def sql: String = "CROSS"
}

Upvotes: 1

Jakumi
Jakumi

Reputation: 8374

according to a doc for the dataset join operators innerlike is used for INNER and CROSS joins.

You can also find that Spark SQL uses the following two families of joins:

  • InnerLike with Inner and Cross
  • LeftExistence with LeftSemi, LeftAnti and ExistenceJoin

Upvotes: 3

Related Questions