is it possible (and how) to specify an sql query on command line with spark-submit

Question

I have the following code:

def main(args: Array[String]) {
    var dvfFiles : String = "g:/data/gouv/dvf/raw" 
    var q : String = ""
    //q = "SELECT distinct DateMutation, NVoie, IndVoie, Voie, Valeur, CodeTypeLocal, TypeLocal, Commune FROM mutations WHERE Commune = 'ICI' and Valeur > 100000 and CodeTypeLocal in (1, 2) order by Valeur desc"

    args.sliding(2, 2).toList.collect {         
        case Array("--sfiles", argFiles: String) => dvfFiles = argFiles
        case Array("--squery", argQ: String) => q = argQ
    }
    println(s"files from: ${dvfFiles}")

if I run the following command:

G:\dev\fromGit\dvf\spark>spark-submit .	arget\scala-2.11\dfvqueryer_2.11-1.0.jar \
--squery "SELECT distinct DateMutation, NVoie, IndVoie, Voie, Valeur, CodeTypeLocal, \
TypeLocal, Commune FROM mutations WHERE (Commune = 'ICI') and (Valeur > 100000) and (CodeTypeLocal in (1, 2)) order by Valeur desc"

I got the following result:

== SQL ==

SELECT distinct DateMutation, NVoie, IndVoie, Voie, Valeur, CodeTypeLocal, TypeLocal, Commune FROM mutations WHERE (Commune = 'ICI') and (Valeur and (CodeTypeLocal in (1, 2)) order by Valeur desc ----------------------------------------------------------------------------------------------^^^

the ^^^ pointing the FROM

I also notice the missing > 100000 after Valeur.

the query is correct because if I uncomment the //q =..., package the code and submit it, all happens fine.

is it possible (and how) to specify an sql query on command line with spark-submit

Answers (1)

Related Questions