yog
yog

Reputation: 209

How to get values from nested json array using spark?

I have this array

val myJson = {
"record": {
"recordId": 100,
"name": "xyz",
"version": "1.1",
"input": [
  {
    "format": "Database",
    "type": "Oracle",
    "connectionStringId": "212",
    "connectionString": "ksfksfklsdflk",
    "schemaName": "schema1",
    "databaseName": "db1",
    "tables": [
      {
        "table_name":"one"
      }
      {
        "table_name":"two"
      }
    ]
  }
]    
}
}

I am using this code to get this json in dataframe

val df = sparkSession.read.json(myjson)

I want values of schemaName & databaseName, how can i get them?

val schemaName = df.select("record.input.schemaName") //not working

Someone, please help me

Upvotes: 0

Views: 988

Answers (1)

blackbishop
blackbishop

Reputation: 32660

You need to explode the array column record.input then select the fields you want :

df.select(explode(col("record.input")).as("inputs"))
  .select("inputs.schemaName", "inputs.databaseName")
  .show

//+----------+------------+
//|schemaName|databaseName|
//+----------+------------+
//|   schema1|         db1|
//+----------+------------+

Upvotes: 1

Related Questions