Reputation: 21
I have looked all over for an answer to this and tried everything. Nothing seems to work. I'm trying to reference a variable assignment within a spark.sql query in python. Running python 3 and spark version 2.3.1.
bkt = 1
prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
FROM pwrcrv_tmp\
where EXTR_CURR_NUM_CYC_DLQ=$bkt\
and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
group by ((year(fdr_date))*100)+month(fdr_date)\
order by ((year(fdr_date))*100)+month(fdr_date)")
prime.show(50)
The error:
prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts FROM pwrcrv_tmp where EXTR_CURR_NUM_CYC_DLQ=$bkt and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA') group by ((year(fdr_date))*100)+month(fdr_date) order by ((year(fdr_date))*100)+month(fdr_date)")
^
SyntaxError: invalid syntax
Upvotes: 1
Views: 5728
Reputation: 2490
Since your query do use multi-line query is it advised as good coding style, to use """ """
to be easier and avoid slash all together as a good coding style"\"
that does have a lot of issues or conversion to be parsed in python and pyspark in general or when using nbconvert for notebook to scrip and variables
in parentheses {}
inside of query or use .format(bkt)
:
option
{}
:
bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
FROM pwrcrv_tmp
where EXTR_CURR_NUM_CYC_DLQ={bkt}
and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
group by ((year(fdr_date))*100)+month(fdr_date)
order by ((year(fdr_date))*100)+month(fdr_date)""")
prime.show(50);
option
.format()
:
bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
FROM pwrcrv_tmp
where EXTR_CURR_NUM_CYC_DLQ={}
and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
group by ((year(fdr_date))*100)+month(fdr_date)
order by ((year(fdr_date))*100)+month(fdr_date)""".format(bkt)
prime.show(50);
Upvotes: 0
Reputation: 1
p_filename ='some value'
z='some value'
query = "INSERT into default.loginfordetails (filename,logdesc) values ('{}','{}') ".format(p_filename,z)
spark.sql(query)
Upvotes: 0
Reputation: 21
I found the correct syntax buried in this databricks post.
https://forums.databricks.com/questions/115/how-do-i-pass-parameters-to-my-sql-statements.html
You add a lower case f in front of the query and wrap braces around the name of the variable in the query.
bkt = 1
prime = spark.sql(f"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
FROM pwrcrv_tmp\
where EXTR_CURR_NUM_CYC_DLQ={bkt}\
and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
group by ((year(fdr_date))*100)+month(fdr_date)\
order by ((year(fdr_date))*100)+month(fdr_date)")
prime.show(50)
Upvotes: 1