Reputation: 156
The scenario:
Let’s say a Company wants to send customers a discount coupon if they spent over $150, this "150" value of the “Discount_Threshold_Parameter” can change at the company’s discretion. This is going to be a parameter that is passed through a databricks notebook “Text” widget.
Let’s also say the company wants to only give discounts for certain product types and not others, the “Product_Type_Parameter” will also be a parameter containing a value like “Jewelry”.
I place these parameters in the Spark SQL query so that anytime the "total_order_amount" column value is greater than the "Discount_Threshold_Parameter" parameter the "Send_Customer_Discount_Coupon" column populates a value of "YES".
In the beginning this wasn't working when I put ‘Discount_Threshold_Parameter’, """Discount_Threshold_Parameter""", or, '""" Discount_Threshold_Parameter """' in the SQL.
It only worked when I put both the open and closing single quote, triple quotes, and the concatenation operator '"""+ Discount_Threshold_Parameter +"""', why is that?
I'm glad it worked but I'm not understanding why?
The example:
Azure Databricks Notebook Widget name and value: Discount_Threshold_Parameter = 150
Azure Databricks Notebook Widget name and value: Product_Type_Parameter = Jewelry
I then pass the parameters to the SQL shown below:
display(spark.sql("""
SELECT
order_id,
order_date,
order_country,
ProductType,
total_order_amount,
'"""+Discount_Threshold_Parameter+"""' AS Company_Specified_Discount,
(CASE
WHEN total_order_amount > '"""+Discount_Threshold_Parameter+'""" THEN 'YES'
ELSE 'NO'
END) AS Send_Customer_Discount_Coupon
FROM Orders
WHERE ProductType = '"""+Product_Type_Parameter+"""'
""")
The above works when placing the parameter in open and closing single quotes, triple quotes, and the concatenation operator but I’m not understanding why all of those have to be used?
Another solution I’ve found that works is using f-strings:
df.createOrReplaceTempView("Orders")
display(spark.sql(f"""
SELECT
order_id,
order_date,
order_country,
Product_Type,
total_order_amount,
{Discount_threshold} AS Company_Specified_Discount,
(CASE
WHEN total_order_amount > {Discount_threshold} THEN 'YES'
ELSE 'NO'
END) AS Send_Customer_Discount_Coupon
FROM Orders
WHERE ProductType = '{Product_Type_Parameter}'
""")
Upvotes: 0
Views: 2200
Reputation: 6114
You can debug what is happening in the following way:
spark.sql()
and assigned it to a string. For demo, I have assigned the variable values directly. Since you are getting the value from text widget, the type is going to be string.Discount_Threshold_Parameter='150'
Product_Type_Parameter = 'Jewelry'
You have to use """+variable+"""
in between because you want to concatenate thier value to your query. Since you are using multiline string as a query, you need to close the triple quotes first, concatenate and open them again (Just like 'My name is'+ first_name+' Cena'
where first_name has a string).
While using +
to concatenate along with additional single quotes ('"""+discount_threshold_parameter+"""'
), it produces '150' inside the query. Look the the following to understand it better. I assigned query to a variable and printed it
query = """SELECT
order_id,
order_date,
order_country,
ProductType,
total_order_amount,
'"""+Discount_Threshold_Parameter+"""' AS Company_Specified_Discount,
(CASE
WHEN total_order_amount > '"""+Discount_Threshold_Parameter+"""' THEN 'YES'
ELSE 'NO'
END) AS Send_Customer_Discount_Coupon
FROM Orders
WHERE ProductType = '"""+Product_Type_Parameter+"""'
"""
#output
SELECT
order_id,
order_date,
order_country,
ProductType,
total_order_amount,
'150' AS Company_Specified_Discount,
(CASE
WHEN total_order_amount > '150' THEN 'YES'
ELSE 'NO'
END) AS Send_Customer_Discount_Coupon
FROM Orders
WHERE ProductType = 'Jewelry'
You can use ' """ +variable+ """ '
when that particular value is a String type in your table, in this case for Product_Type_Parameter
you can use this but not for Discount_Threshold_Parameter
(because you would be comparing total_order_amount to a string i.e., total_order_amount > '150' instead of total_order_amount > 150)
F-strings are much more convenient to use here without any confusion. Specifying something inside {}
would help python interpret it as variable. The following is the query output we would get using the f-string you used.
SELECT
order_id,
order_date,
order_country,
Product_Type,
total_order_amount,
150 AS Company_Specified_Discount,
(CASE
WHEN total_order_amount > 150 THEN 'YES'
ELSE 'NO'
END) AS Send_Customer_Discount_Coupon
FROM Orders
WHERE ProductType = 'Jewelry'
NOTE:
spark.sql()
.Upvotes: 1