user9297554
user9297554

Reputation: 467

How do you explode an array of JSON string into rows?

My UDF function returns a json object array as string, how can I expand the array into dataframe rows?

If it isn't possible, is there any other way (like using Struct) to achieve this?

Here is my JSON data:

sample json
{
"items":[ {"Name":"test", Id:"1"}, {"Name":"sample", Id:"2"}]
}

And here is how I want it to end up like:

test, 1
sample, 2

Upvotes: 0

Views: 524

Answers (1)

SanBan
SanBan

Reputation: 655

The idea is spark can read any paralellized collection hence we take the string and parallelize it and read as a dataset

Code =>

import org.apache.spark.sql.functions._

val sampleJsonStr = """
     | {
     | "items":[ {"Name":"test", "Id":"1"}, {"Name":"sample", "Id":"2"}]
     | }"""

val jsonDf = spark.read.option("multiLine","true").json(Seq(sampleJsonStr).toDS)
//jsonDf: org.apache.spark.sql.DataFrame = [items: array<struct<Id:string,Name:string>>]

// Finally we explode the json array
val explodedDf = jsonDf.
select("items").
withColumn("exploded_items",explode(col("items"))).
select(col("exploded_items.Id"),col("exploded_items.Name"))

Output =>

scala> explodedDf.show(false)
+---+------+
|Id |Name  |
+---+------+
|1  |test  |
|2  |sample|
+---+------+

Upvotes: 1

Related Questions