Reputation: 1
{
"config1":{
"url":"xxxx",
"database":"xxxx",
"dbTable":"xxxx"
},
"config2":{
"url":"xxxx",
"database":"xxxxx",
"dbTable":"xxxxx"
},
"snippets":{
"optionA":{
"months_back":"2",
"list":{
"code1":{
"id":"11111",
"country":"11111"
},
"code2":{
"id":"2222",
"country":"2222"
},
"code3":{
"id":"3333",
"country":"3333"
}
}
}
}
}
let's say I have a config.json that looks like that, I have some code with a query I need to swap parameters with the id and country in that json
So far my code is something like this
import spark.implicits._
val df = sqlContext.read.option("multiline","true").json("path_to_json")
val range_df = df.select("snippets.optionA.months_back").collect()
val range_str = range_df.map(x => x.get(0))
val range = range_str(0)
val list = df.select("snippets.optionA.list.*")).collect()
I need something like
For(x <- json_list){
val results = spark.sql("""
select * from table
where date >= add_months(current_date(), -"""+range+""")
and country = """+json_list(country)+"""
and id = """+json_lis(id)+""")
the List after collect() is list: Array[org.apache.spark.sql.Row] and I have no idea how to iterate over it.
Any help is welcome, thank you
Upvotes: 0
Views: 434
Reputation: 10382
Convert snippets.optionA.list.*
inner struct into array(snippets.optionA.list.*)
& iterate each value from this array.
Check below code.
val queriesResult = df
.withColumn(
"query",
explode(
expr(
"""
|transform(
| array(snippets.optionA.list.*),
| v -> concat(
| 'SELECT * FROM TABLE WHERE DATE >= add_months(current_date(), -',
| snippets.optionA.months_back,
| ') AND country=\"',
| v.country,
| '\" AND id =',
| v.id
| )
|)
|""".stripMargin
)
)
)
.select("query")
.as[String]
.collect
.map { query =>
spark.sql(query)
}
.collect function will return array of queries like below, then using map
function to pass each query
to spark.sql
function to execute query.
Array(
"SELECT * FROM TABLE WHERE DATE >= add_months(current_date(), -2) AND country="11111" AND id =11111",
"SELECT * FROM TABLE WHERE DATE >= add_months(current_date(), -2) AND country="2222" AND id =2222",
"SELECT * FROM TABLE WHERE DATE >= add_months(current_date(), -2) AND country="3333" AND id =3333"
)
Spark Version >= 2.4 +
Upvotes: 1