ak17
ak17

Reputation: 192

Join using contains in spark sql

I have a dataset like:

+---+--------------+
| id|  name_and_age|
+---+--------------+
|  1|   Anu,23     |
|  2|Suresh,24     |
|  3|  Usha,24     |
|  4| Nisha,25     |
+---+--------------+

and

+---+----+
|id2| age|
+---+----+
|  a|  23|
|  b|  24|
|  c|  24|
|  d|  25|
+---+----+

I need to join both datasets so I am using contains

dataset1.join(dataset2,dataset.col("name_and_age").contains(dataset2.col(age)),"inner")

but I am not getting correct results Are there any other ways to join this? Please consider there are no other columns to use in join conditions.

Upvotes: 0

Views: 634

Answers (1)

werner
werner

Reputation: 14845

Add a age column to dataset1 and use this column to join.

import org.apache.spark.sql.functions._
import spark.implicits._

val dataset3 = dataset1.withColumn("age", split('name_and_age, ",").getItem(1))
dataset3.join(dataset2, "age").show()

Output:

+---+---+------------+---+
|age| id|name_and_age|id2|
+---+---+------------+---+
| 23|  1|      Anu,23|  a|
| 25|  4|    Nisha,25|  d|
| 24|  2|   Suresh,24|  b|
| 24|  2|   Suresh,24|  c|
| 24|  3|     Usha,24|  b|
| 24|  3|     Usha,24|  c|
+---+---+------------+---+

Upvotes: 1

Related Questions