user2919673
user2919673

Reputation: 1

Scala - iterate over structs in json and get values

{
    "config1":{
        "url":"xxxx",
        "database":"xxxx",
        "dbTable":"xxxx"
    },
    "config2":{
        "url":"xxxx",
        "database":"xxxxx",
        "dbTable":"xxxxx"
    },
    "snippets":{
        "optionA":{
            "months_back":"2",
            "list":{
                "code1":{
                    "id":"11111",
                    "country":"11111"
                },
                "code2":{
                    "id":"2222",
                    "country":"2222"
                },
                "code3":{
                    "id":"3333",
                    "country":"3333"
                }
            }
        }
    }
}

let's say I have a config.json that looks like that, I have some code with a query I need to swap parameters with the id and country in that json

So far my code is something like this

import spark.implicits._
val df = sqlContext.read.option("multiline","true").json("path_to_json")

val range_df = df.select("snippets.optionA.months_back").collect()
val range_str = range_df.map(x => x.get(0))
val range = range_str(0) 
val list = df.select("snippets.optionA.list.*")).collect()

I need something like

For(x <- json_list){

val results = spark.sql("""
  select * from table
  where date >= add_months(current_date(), -"""+range+""")
  and country = """+json_list(country)+"""
  and id = """+json_lis(id)+""")

the List after collect() is list: Array[org.apache.spark.sql.Row] and I have no idea how to iterate over it.

Any help is welcome, thank you

Upvotes: 0

Views: 434

Answers (1)

s.polam
s.polam

Reputation: 10382

Convert snippets.optionA.list.* inner struct into array(snippets.optionA.list.*) & iterate each value from this array.

Check below code.

val queriesResult = df
    .withColumn(
      "query",
      explode(
        expr(
          """
            |transform(
            |     array(snippets.optionA.list.*),
            |     v -> concat(
            |                 'SELECT * FROM TABLE WHERE DATE >= add_months(current_date(), -',
            |                  snippets.optionA.months_back,
            |                 ') AND country=\"',
            |                 v.country,
            |                 '\" AND id =',
            |                 v.id
            |      )
            |)
            |""".stripMargin

        )
      )
    )
    .select("query")
    .as[String]
    .collect 
    .map { query =>
      spark.sql(query)
    }

.collect function will return array of queries like below, then using map function to pass each query to spark.sql function to execute query.

Array(
"SELECT * FROM TABLE WHERE DATE >= add_months(current_date(), -2) AND country="11111" AND id =11111", 
"SELECT * FROM TABLE WHERE DATE >= add_months(current_date(), -2) AND country="2222" AND id =2222", 
"SELECT * FROM TABLE WHERE DATE >= add_months(current_date(), -2) AND country="3333" AND id =3333"
)

Spark Version >= 2.4 +

Upvotes: 1

Related Questions