Reputation: 1584
I am writing spark code in python. How do I pass a variable in a spark.sql query?
q25 = 500
Q1 = spark.sql("SELECT col1 from table where col2>500 limit $q25 , 1")
Currently the above code does not work? How do we pass variables?
I have also tried,
Q1 = spark.sql("SELECT col1 from table where col2>500 limit q25='{}' , 1".format(q25))
Upvotes: 31
Views: 81222
Reputation: 2436
Using f-Strings approach (PySpark):
table = 'my_schema.my_table'
df = spark.sql(f'select * from {table}')
Upvotes: 9
Reputation: 1263
A really easy solution is to store the query as a string (using the usual python formatting), and then pass it to the spark.sql()
function:
q25 = 500
query = "SELECT col1 from table where col2>500 limit {}".format(q25)
Q1 = spark.sql(query)
Upvotes: 2
Reputation: 2124
Another option if you're doing this sort of thing often or want to make your code easier to re-use is to use a map of configuration variables and the format option:
configs = {"q25":10,
"TABLE_NAME":"my_table",
"SCHEMA":"my_schema"}
Q1 = spark.sql("""SELECT col1 from {SCHEMA}.{TABLE_NAME}
where col2>500
limit {q25}
""".format(**configs))
Upvotes: 5
Reputation: 9
All you need to do is add s (String interpolator) to the string. This allows the usage of variable directly into the string.
val q25 = 10
Q1 = spark.sql(s"SELECT col1 from table where col2>500 limit $q25)
Upvotes: 0
Reputation: 6556
You need to remove single quote and q25
in string formatting like this:
Q1 = spark.sql("SELECT col1 from table where col2>500 limit {}, 1".format(q25))
Update:
Based on your new queries:
spark.sql("SELECT col1 from table where col2>500 order by col1 desc limit {}, 1".format(q25))
Note that the SparkSQL does not support OFFSET, so the query cannot work.
If you need add multiple variables you can try this way:
q25 = 500
var2 = 50
Q1 = spark.sql("SELECT col1 from table where col2>{0} limit {1}".format(var2,q25))
Upvotes: 24