None
None

Reputation: 1468

How to parse nested JSON objects in Spark SQL?

I have a schema as shown below. How can I parse the nested objects?

root
 |-- apps: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- appName: string (nullable = true)
 |    |    |-- appPackage: string (nullable = true)
 |    |    |-- Ratings: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- date: string (nullable = true)
 |    |    |    |    |-- rating: long (nullable = true)
 |-- id: string (nullable = true)

Upvotes: 24

Views: 66247

Answers (5)

Adelina Balasa
Adelina Balasa

Reputation: 91

Have you tried doing it straight from the SQL query like

Select apps.element.Ratings from yourTableName

This will probably return an array and you can more easily access the elements inside. Also, I use this online JSON viewer when I have to deal with large JSON structures and the schema is too complex.

Upvotes: 3

filip stepniak
filip stepniak

Reputation: 109

I am using PySpark, but the logic should be similar. I found this way of parsing my nested JSON useful:

df.select(df.apps.appName.alias("apps_Name"), \
          df.apps.appPackage.alias("apps_Package"), \
          df.apps.Ratings.date.alias("apps_Ratings_date")) \
   .show()

The code could be obviously shortened with a f-string.

Upvotes: 3

Ganesh
Ganesh

Reputation: 757

var df = spark.read.format("json").load("/path/to/file")
df.createOrReplaceTempView("df");
spark.sql("select apps.element.Ratings from df where apps.element.appName like '%app_name%' ").show()

Upvotes: 1

ben jarman
ben jarman

Reputation: 1138

Try this:

val nameAndAddress = sqlContext.sql("""
    SELECT name, address.city, address.state
    FROM people
""")
nameAndAddress.collect.foreach(println)

Source: https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html

Upvotes: 5

Vasilis Vagias
Vasilis Vagias

Reputation: 416

Assuming you read in a json file and print the schema you are showing us like this:

DataFrame df = sqlContext.read().json("/path/to/file").toDF();
    df.registerTempTable("df");
    df.printSchema();

Then you can select nested objects inside a struct type like so...

DataFrame app = df.select("app");
        app.registerTempTable("app");
        app.printSchema();
        app.show();
DataFrame appName = app.select("element.appName");
        appName.registerTempTable("appName");
        appName.printSchema();
        appName.show();

Upvotes: 28

Related Questions