Reputation: 1
I have a lot of JSON files which are not structured and I want to get a deeper element and all the element to get to it.
For example :
{
"menu": {
"id": "file",
"popup": {
"menuitem": {
"module"{
"-vdsr": "New",
"-sdst": "Open",
"-mpoi": "Close" }
...
}
}
In this case the result would be :
menu.popup.menuitem.module.-vdsr
menu.popup.menuitem.module.-sdst
menu.popup.menuitem.module.-mpoi
I tried Jackson
and Json4s
and they are efficient to go the last value but, I don't see how I can get the whole structure.
I want this to run a job with apache spark on very huge JSON files and the structure will be very complex for each. I also tried sparkSQL but if I don't know the entire structure I can't get it.
Upvotes: 0
Views: 818
Reputation: 5708
The beauty of the new DataFrame API is that it infers schema automatically, as you load the data.
It does so by doing a one-time pass over the whole data set.
I suggest that you load your json and then play with transformations and see what you can come up with. You can remove or select a subset of columns, filter rows, aggregate, map over them, anything really.
E.g:
// you can use globing to load multiple files
val jsonTbl = sqlContext.load("path to json file", "json")
// print the inferred schema
jsonTbl.printSchema
// now you can use the DataFrame API to transform the data set, e.g.
val outputTbl = jsonTbl
.filter("menu.popup.menuitem.module.-vdsr = 'Some value'")
.groupBy("menu.popup.menuitem.module.-vdsr").count
.select("menu.popup.menuitem.module.-sdst", "other fields/columns")
outputTbl.show
Upvotes: 0
Reputation: 16324
What you're asking to do is essentially a tree traversal of an object, where JSON objects are considered nodes with named branches and other JSON types are considered leaves. There are many ways to do this. You might consider making a recursive function that explores the entire tree. Here is an example that works in PlayJson
, but it shouldn't be very different in other libraries:
import play.api.libs.json._
def unfold(json: JsValue): Seq[String] = json match {
case JsObject(kvps) => kvps.flatMap {
case (key, value) => unfold(value).map(path => s"$key.$path")
}
case _ => Seq("")
}
Upvotes: 0