Reputation: 357
I want to create the dataframe out of json for only given key. It values is a list and that is nested json type. I tried for flattening but I think there could be some workaround as I only need one key of json to convert into dataframe. I have json like:
("""
{
"Id_columns": 2,
"metadata": [{
"id": "1234",
"type": "file",
"length": 395
}, {
"id": "1235",
"type": "file2",
"length": 396
}]
}""")
Now I want to create a DataFrame using spark for only key 'metadata', I have written code:
val json = Json.parse("""
{
"Id_columns": 2,
"metadata": [{
"id": "1234",
"type": "file",
"length": 395
}, {
"id": "1235",
"type": "file2",
"length": 396
}]
}""")
var jsonlist = Json.stringify(json("metadata"))
val rddData = spark.sparkContext.parallelize(jsonlist)
resultDF = spark.read.option("timestampFormat", "yyyy/MM/dd HH:mm:ss ZZ").json(rddData)
resultDF.show()
But it's giving me error:
overloaded method value json with alternatives:
cannot be applied to (org.apache.spark.rdd.RDD[Char])
[error] val resultDF = spark.read.option("timestampFormat", "yyyy/MM/dd HH:mm:ss ZZ").json(rddData)
^
I am expecting result:
+----+-----+--------+
| id | type| length |
+----+-----+--------+
|1234|file1| 395 |
|1235|file2| 396 |
+----+-----+--------+
Upvotes: 1
Views: 1090
Reputation: 2518
You need to explode your array like this :
import spark.implicits._
import org.apache.spark.sql.functions._
val df = spark.read.json(
spark.sparkContext.parallelize(Seq("""{"Id_columns":2,"metadata":[{"id":"1234","type":"file","length":395},{"id":"1235","type":"file2","length":396}]}"""))
)
df.select(explode($"metadata").as("metadata"))
.select("metadata.*")
.show(false)
Output :
+----+------+-----+
|id |length|type |
+----+------+-----+
|1234|395 |file |
|1235|396 |file2|
+----+------+-----+
Upvotes: 3