Babu
Babu

Reputation: 891

Join Dataframes dynamically using Spark Scala when JOIN columns differ

Dynamically select multiple columns while joining different Dataframe in scala spark

From the above link , I was able to have the join expression working , but what if the column names are different, we cannot use Seq(columns) and need to join it dynamically. Here left_ds and right_ds are the dataframes which I wanted to join. Below I want to join columns id=acc_id and "acc_no=number"

left_da => id,acc_no,name,ph

right_ds => acc_id,number,location

val joinKeys="id,acc_id|acc_no,number"
val joinKeyPair: Array[(String, String)] = joinKeys.split("\\|").map(_.split(",")).map(x => x(0).toUpperCase -> x(1).toUpperCase)

val joinExpr: Column = joinKeyPair.map { case (ltable_col, rtable_col) =>left_ds.col(ltable_col) === right_ds.col(rtable_col)}.reduce(_ and _)

left_ds.join(right_ds, joinExpr, "left_outer")

Above is the join expression I was trying but it not working. Is there a way to achieve this if the join column names are different with out using Seq. So if the number of join keys increase ,I should still be able to make the code work dynamically.

Upvotes: 1

Views: 1427

Answers (1)

pasha701
pasha701

Reputation: 7207

With aliases have to work fine:

val conditionArrays = joinKeys.split("\\|").map(c => c.split(","))
val joinExpr = conditionArrays.map { case Array(a, b) => col("a." + a) === col("b." + b) }.reduce(_ and _)
left_ds.alias("a").join(right_ds.alias("b"), joinExpr, "left_outer")

Upvotes: 1

Related Questions