Reputation: 11
I have a dataframe like this:
root
|-- runKeyId: string (nullable = true)
|-- entities: string (nullable = true)
+--------+--------------------------------------------------------------------------------------------+
|runKeyId|entities |
+--------+--------------------------------------------------------------------------------------------+
|1 |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339},{"Partition":{"Name":"DDD"},"id":339}|
and I would like to explode into this with scala:
+--------+--------------------------------------------------------------------------------------------+
|runKeyId|entities |
+--------+--------------------------------------------------------------------------------------------+
|1 |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339}
+--------+--------------------------------------------------------------------------------------------+
|2 |{"Partition":{"Name":"DDD"},"id":339}
+--------+--------------------------------------------------------------------------------------------+
Upvotes: 1
Views: 549
Reputation: 23099
Looks like you don't have a valid JSON, So fix the JSON first and then you can read as JSON and explode it as below.
val df = Seq(
("1", "{\"Partition\":[{\"Name\":\"ABC\"},{\"Name\":\"DBC\"}],\"id\":339},{\"Partition\":{\"Name\":\"DDD\"},\"id\":339}")
).toDF("runKeyId", "entities")
.withColumn("entities", concat(lit("["), $"entities", lit("]"))) //fix the json
val resultDF = df.withColumn("entities",
explode(from_json($"entities", schema_of_json(df.select($"entities").first().getString(0))))
).withColumn("entities", to_json($"entities"))
resultDF.show(false)
Output:
+--------+----------------------------------------------------------------+
|runKeyId|entities |
+--------+----------------------------------------------------------------+
|1 |{"Partition":"[{\"Name\":\"ABC\"},{\"Name\":\"DBC\"}]","id":339}|
|1 |{"Partition":"{\"Name\":\"DDD\"}","id":339} |
+--------+----------------------------------------------------------------+
Upvotes: 1