JTD2021
JTD2021

Reputation: 156

Not understanding why I have to use single and triple quotes along with the concatenation operator to pass parameters in Azure Databricks

The scenario:

  1. Let’s say a Company wants to send customers a discount coupon if they spent over $150, this "150" value of the “Discount_Threshold_Parameter” can change at the company’s discretion. This is going to be a parameter that is passed through a databricks notebook “Text” widget.

  2. Let’s also say the company wants to only give discounts for certain product types and not others, the “Product_Type_Parameter” will also be a parameter containing a value like “Jewelry”.

I place these parameters in the Spark SQL query so that anytime the "total_order_amount" column value is greater than the "Discount_Threshold_Parameter" parameter the "Send_Customer_Discount_Coupon" column populates a value of "YES".

In the beginning this wasn't working when I put ‘Discount_Threshold_Parameter’, """Discount_Threshold_Parameter""", or, '""" Discount_Threshold_Parameter """' in the SQL.

It only worked when I put both the open and closing single quote, triple quotes, and the concatenation operator '"""+ Discount_Threshold_Parameter +"""', why is that?

I'm glad it worked but I'm not understanding why?

The example:

Azure Databricks Notebook Widget name and value: Discount_Threshold_Parameter = 150

Azure Databricks Notebook Widget name and value: Product_Type_Parameter = Jewelry

I then pass the parameters to the SQL shown below:

display(spark.sql(""" 
                    SELECT 
                    order_id, 
                    order_date, 
                    order_country,
                    ProductType,
                    total_order_amount, 
                    '"""+Discount_Threshold_Parameter+"""' AS Company_Specified_Discount, 
                    (CASE 
                    WHEN total_order_amount > '"""+Discount_Threshold_Parameter+'""" THEN 'YES' 
                    ELSE 'NO' 
                    END) AS Send_Customer_Discount_Coupon 
                    FROM Orders 
                    WHERE ProductType = '"""+Product_Type_Parameter+"""'
                 """)

The above works when placing the parameter in open and closing single quotes, triple quotes, and the concatenation operator but I’m not understanding why all of those have to be used?

Another solution I’ve found that works is using f-strings:

df.createOrReplaceTempView("Orders") 

display(spark.sql(f""" 
                    SELECT 
                    order_id, 
                    order_date, 
                    order_country,
                    Product_Type,
                    total_order_amount, 
                    {Discount_threshold} AS Company_Specified_Discount, 
                    (CASE 
                    WHEN total_order_amount > {Discount_threshold} THEN 'YES' 
                    ELSE 'NO' 
                    END) AS Send_Customer_Discount_Coupon 
                    FROM Orders 
                    WHERE ProductType = '{Product_Type_Parameter}' 
                 """)

Upvotes: 0

Views: 2200

Answers (1)

Saideep Arikontham
Saideep Arikontham

Reputation: 6114

You can debug what is happening in the following way:

  • I have taken the string query you are passing to spark.sql() and assigned it to a string. For demo, I have assigned the variable values directly. Since you are getting the value from text widget, the type is going to be string.
Discount_Threshold_Parameter='150'
Product_Type_Parameter = 'Jewelry'
  • You have to use """+variable+""" in between because you want to concatenate thier value to your query. Since you are using multiline string as a query, you need to close the triple quotes first, concatenate and open them again (Just like 'My name is'+ first_name+' Cena' where first_name has a string).

  • While using + to concatenate along with additional single quotes ('"""+discount_threshold_parameter+"""'), it produces '150' inside the query. Look the the following to understand it better. I assigned query to a variable and printed it

query = """SELECT 
           order_id, 
           order_date, 
           order_country,
           ProductType,
           total_order_amount, 
           '"""+Discount_Threshold_Parameter+"""' AS Company_Specified_Discount, 
           (CASE 
           WHEN total_order_amount > '"""+Discount_Threshold_Parameter+"""' THEN 'YES'
           ELSE 'NO' 
           END) AS Send_Customer_Discount_Coupon 
           FROM Orders 
           WHERE ProductType = '"""+Product_Type_Parameter+"""'
"""
#output

SELECT 
           order_id, 
           order_date, 
           order_country,
           ProductType,
           total_order_amount, 
           '150' AS Company_Specified_Discount, 
           (CASE 
           WHEN total_order_amount > '150' THEN 'YES'
           ELSE 'NO' 
           END) AS Send_Customer_Discount_Coupon 
           FROM Orders 
           WHERE ProductType = 'Jewelry'
  • You can use ' """ +variable+ """ ' when that particular value is a String type in your table, in this case for Product_Type_Parameter you can use this but not for Discount_Threshold_Parameter (because you would be comparing total_order_amount to a string i.e., total_order_amount > '150' instead of total_order_amount > 150)

  • F-strings are much more convenient to use here without any confusion. Specifying something inside {} would help python interpret it as variable. The following is the query output we would get using the f-string you used.

SELECT 
                    order_id, 
                    order_date, 
                    order_country,
                    Product_Type,
                    total_order_amount, 
                    150 AS Company_Specified_Discount, 
                    (CASE 
                    WHEN total_order_amount > 150 THEN 'YES' 
                    ELSE 'NO' 
                    END) AS Send_Customer_Discount_Coupon 
                    FROM Orders 
                    WHERE ProductType = 'Jewelry' 
  • Here, discount is compared as an integer and jewelry as a string.

NOTE:

  • You can use single quotes in this requirement if the value you want to apply condition on, is a string (either F-strings or concatenation) when writing multiline query (as a String).
  • You can follow similar procedure, assign query to a variable and print it to understand what query(string) you will be using inside spark.sql().

Upvotes: 1

Related Questions