Reputation: 11
I have a DF in the following structure
Col1. Col2 Col3
Data1Col1,Data2Col1. Data1Col2,Data2Col2. Data1Col3,Data2Col3
I want the resultant dataset to be of the following type:
Col1 Col2 Col3
Data1Col1. Data1Col2. Data1Col3
Data2Col1. Data2Col2 Data2Col3
Please suggest me how to approach this. I have tried explode , but that results in duplicate rows.
Upvotes: 1
Views: 648
Reputation: 94
val df = Seq(("C,D,E,F","M,N,O,P","K,P,B,P")).toDF("Col1","Col2","Col3")
df.show
+-------+-------+-------+
| Col1| Col2| Col3|
+-------+-------+-------+
|C,D,E,F|M,N,O,P|K,P,B,P|
+-------+-------+-------+
val res1 = df.withColumn("Col1",split(col("Col1"),",")).withColumn("Col2",split(col("Col2"),",")).withColumn("Col3",split(col("Col3"),","))
res1.show
+------------+------------+------------+
| Col1| Col2| Col3|
+------------+------------+------------+
|[C, D, E, F]|[M, N, O, P]|[K, P, B, P]|
+------------+------------+------------+
val zip = udf((x: Seq[String], y: Seq[String], z: Seq[String]) => z.zip(x.zip(y)))
val res14 = res1.withColumn("test",explode(zip(col("Col1"),col("Col2"),col("Col3")))).show
+------------+------------+------------+-----------+
| Col1| Col2| Col3| test|
+------------+------------+------------+-----------+
|[C, D, E, F]|[M, N, O, P]|[K, P, B, P]|[K, [C, M]]|
|[C, D, E, F]|[M, N, O, P]|[K, P, B, P]|[P, [D, N]]|
|[C, D, E, F]|[M, N, O, P]|[K, P, B, P]|[B, [E, O]]|
|[C, D, E, F]|[M, N, O, P]|[K, P, B, P]|[P, [F, P]]|
+------------+------------+------------+-----------+
res14.withColumn("t3",col("test._1")).withColumn("tn",col("test._2")).withColumn("t2",col("tn._2")).withColumn("t1",col("tn._1")).select("t1","t2","t3").show
+---+---+---+
| t1| t2| t3|
+---+---+---+
| C| M| K|
| D| N| P|
| E| O| B|
| F| P| P|
+---+---+---+
res1 - Initial Dataframe
res14 - intermediate Df
Upvotes: 1