Anna
Anna

Reputation: 571

Spark split a column value into multiple rows

My problem is I have a table like this:

------------------------
A  B    C
------------------------
a1 b2   c1|c2|c3|c4

c1|c2|c3|c4 is one value separated by |.

My final result should look like this:

---------
A  B   C
---------
a1 b1  c1
a1 b1  c2
a1 b1  c3
a1 b1  c4

How do I do this?

Thanks

Upvotes: 2

Views: 4947

Answers (1)

koiralo
koiralo

Reputation: 23119

This is what you could do, split the string with pipe and explode the data using spark function

import org.apache.spark.sql.functions._
import spark.implicits._

val df = Seq(("a1", "b1", "c1|c2|c3|c4")).toDF("A", "B", "C")

df.withColumn("C", explode(split($"C", "\\|"))).show

Output:

+---+---+---+
|  A|  B|  C|
+---+---+---+
| a1| b1| c1|
| a1| b1| c2|
| a1| b1| c3|
| a1| b1| c4|
+---+---+---+

Hope this helps!

Upvotes: 8

Related Questions