Reputation: 192
I have a dataset like:
+---+--------------+
| id| name_and_age|
+---+--------------+
| 1| Anu,23 |
| 2|Suresh,24 |
| 3| Usha,24 |
| 4| Nisha,25 |
+---+--------------+
and
+---+----+
|id2| age|
+---+----+
| a| 23|
| b| 24|
| c| 24|
| d| 25|
+---+----+
I need to join both datasets so I am using contains
dataset1.join(dataset2,dataset.col("name_and_age").contains(dataset2.col(age)),"inner")
but I am not getting correct results Are there any other ways to join this? Please consider there are no other columns to use in join conditions.
Upvotes: 0
Views: 634
Reputation: 14845
Add a age
column to dataset1
and use this column to join.
import org.apache.spark.sql.functions._
import spark.implicits._
val dataset3 = dataset1.withColumn("age", split('name_and_age, ",").getItem(1))
dataset3.join(dataset2, "age").show()
Output:
+---+---+------------+---+
|age| id|name_and_age|id2|
+---+---+------------+---+
| 23| 1| Anu,23| a|
| 25| 4| Nisha,25| d|
| 24| 2| Suresh,24| b|
| 24| 2| Suresh,24| c|
| 24| 3| Usha,24| b|
| 24| 3| Usha,24| c|
+---+---+------------+---+
Upvotes: 1