Reputation: 151
I have an array JSON as below format
{
"marks": [
{
"subject": "Maths",
"mark": "80"
},
{
"subject": "Physics",
"mark": "70"
},
{
"subject": "Chemistry",
"mark": "60"
}
]
}
I need to split each array object as separate JSON files. Is there any way to do this in spark shell.
Upvotes: 0
Views: 290
Reputation: 42422
You can explode the marks array of structs, add an ID column, and write JSON files partitioned by the unique ID column.
df.selectExpr("inline(marks)")
.withColumn("id", monotonically_increasing_id)
.repartition(col("id"))
.write
.partitionBy("id")
.json("output")
Upvotes: 1