Baptiste
Baptiste

Reputation: 1

How to get the structure of JSON in scala

I have a lot of JSON files which are not structured and I want to get a deeper element and all the element to get to it.

For example :

{
"menu": {
    "id": "file",
    "popup": {
        "menuitem": {
                  "module"{
                      "-vdsr": "New",
                      "-sdst": "Open",
                      "-mpoi": "Close" }
        ...
    }
}

In this case the result would be :

menu.popup.menuitem.module.-vdsr
menu.popup.menuitem.module.-sdst
menu.popup.menuitem.module.-mpoi

I tried Jackson and Json4s and they are efficient to go the last value but, I don't see how I can get the whole structure.

I want this to run a job with apache spark on very huge JSON files and the structure will be very complex for each. I also tried sparkSQL but if I don't know the entire structure I can't get it.

Upvotes: 0

Views: 818

Answers (2)

Marko Bonaci
Marko Bonaci

Reputation: 5708

The beauty of the new DataFrame API is that it infers schema automatically, as you load the data.
It does so by doing a one-time pass over the whole data set.

I suggest that you load your json and then play with transformations and see what you can come up with. You can remove or select a subset of columns, filter rows, aggregate, map over them, anything really.

E.g:

// you can use globing to load multiple files
val jsonTbl = sqlContext.load("path to json file", "json")

// print the inferred schema
jsonTbl.printSchema

// now you can use the DataFrame API to transform the data set, e.g.
val outputTbl = jsonTbl
    .filter("menu.popup.menuitem.module.-vdsr = 'Some value'")
    .groupBy("menu.popup.menuitem.module.-vdsr").count
    .select("menu.popup.menuitem.module.-sdst", "other fields/columns")

outputTbl.show

Upvotes: 0

Ben Reich
Ben Reich

Reputation: 16324

What you're asking to do is essentially a tree traversal of an object, where JSON objects are considered nodes with named branches and other JSON types are considered leaves. There are many ways to do this. You might consider making a recursive function that explores the entire tree. Here is an example that works in PlayJson, but it shouldn't be very different in other libraries:

import play.api.libs.json._
def unfold(json: JsValue): Seq[String] = json match {
    case JsObject(kvps) => kvps.flatMap {
        case (key, value) => unfold(value).map(path => s"$key.$path")
    }
    case _ => Seq("")
}

Upvotes: 0

Related Questions