shrey pavagadhi
shrey pavagadhi

Reputation: 11

split Json array into two rows spark scala

I have a dataframe like this:

root
 |-- runKeyId: string (nullable = true)
 |-- entities: string (nullable = true)
+--------+--------------------------------------------------------------------------------------------+ 
|runKeyId|entities                                                                                    |
+--------+--------------------------------------------------------------------------------------------+ 
|1       |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339},{"Partition":{"Name":"DDD"},"id":339}|

and I would like to explode into this with scala:

+--------+--------------------------------------------------------------------------------------------+
|runKeyId|entities                                                                                    |
+--------+--------------------------------------------------------------------------------------------+
|1       |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339}
+--------+--------------------------------------------------------------------------------------------+
|2       |{"Partition":{"Name":"DDD"},"id":339}
+--------+--------------------------------------------------------------------------------------------+

Upvotes: 1

Views: 549

Answers (1)

koiralo
koiralo

Reputation: 23099

Looks like you don't have a valid JSON, So fix the JSON first and then you can read as JSON and explode it as below.

val df = Seq(
  ("1", "{\"Partition\":[{\"Name\":\"ABC\"},{\"Name\":\"DBC\"}],\"id\":339},{\"Partition\":{\"Name\":\"DDD\"},\"id\":339}")
).toDF("runKeyId", "entities")
  .withColumn("entities", concat(lit("["), $"entities", lit("]"))) //fix the json 


val resultDF = df.withColumn("entities",
  explode(from_json($"entities", schema_of_json(df.select($"entities").first().getString(0))))
).withColumn("entities", to_json($"entities"))


resultDF.show(false)

Output:

+--------+----------------------------------------------------------------+
|runKeyId|entities                                                        |
+--------+----------------------------------------------------------------+
|1       |{"Partition":"[{\"Name\":\"ABC\"},{\"Name\":\"DBC\"}]","id":339}|
|1       |{"Partition":"{\"Name\":\"DDD\"}","id":339}                     |
+--------+----------------------------------------------------------------+

Upvotes: 1

Related Questions