Reputation: 467
My UDF function returns a json object array as string, how can I expand the array into dataframe rows?
If it isn't possible, is there any other way (like using Struct) to achieve this?
Here is my JSON data:
sample json
{
"items":[ {"Name":"test", Id:"1"}, {"Name":"sample", Id:"2"}]
}
And here is how I want it to end up like:
test, 1
sample, 2
Upvotes: 0
Views: 524
Reputation: 655
The idea is spark can read any paralellized collection hence we take the string and parallelize it and read as a dataset
Code =>
import org.apache.spark.sql.functions._
val sampleJsonStr = """
| {
| "items":[ {"Name":"test", "Id":"1"}, {"Name":"sample", "Id":"2"}]
| }"""
val jsonDf = spark.read.option("multiLine","true").json(Seq(sampleJsonStr).toDS)
//jsonDf: org.apache.spark.sql.DataFrame = [items: array<struct<Id:string,Name:string>>]
// Finally we explode the json array
val explodedDf = jsonDf.
select("items").
withColumn("exploded_items",explode(col("items"))).
select(col("exploded_items.Id"),col("exploded_items.Name"))
Output =>
scala> explodedDf.show(false)
+---+------+
|Id |Name |
+---+------+
|1 |test |
|2 |sample|
+---+------+
Upvotes: 1