Gokul
Gokul

Reputation: 493

How to Join Multiple Columns in Spark SQL using Java for filtering in DataFrame

I tried using

a.join(b,a.col("x").equalTo(b.col("x")) && a.col("y").equalTo(b.col("y"),"inner")

But Java is throwing error saying && is not allowed.

Upvotes: 13

Views: 18001

Answers (2)

user15059143
user15059143

Reputation: 21

If you want to use Multiple columns for join, you can do something like this:

a.join(b,scalaSeq, joinType)

You can store your columns in Java-List and convert List to Scala seq. Conversion of Java-List to Scala-Seq:

scalaSeq = JavaConverters.asScalaIteratorConverter(list.iterator()).asScala().toSeq();

Example: a = a.join(b, scalaSeq, "inner");

Note: Dynamic columns will be easily supported in this way.

Upvotes: 1

zero323
zero323

Reputation: 330413

Spark SQL provides a group of methods on Column marked as java_expr_ops which are designed for Java interoperability. It includes and (see also or) method which can be used here:

a.col("x").equalTo(b.col("x")).and(a.col("y").equalTo(b.col("y"))

Upvotes: 35

Related Questions