Patterson
Patterson

Reputation: 2821

How to Convert T-SQL IF statement to Databricks PySpark

I have the following code in T-SQL

IF NOT EXISTS (   SELECT     *   FROM     INFORMATION_SCHEMA.COLUMNS   WHERE     TABLE_NAME = 'airports' AND COLUMN_NAME = 'airport_region') SELECT * FROM airports;

I would like to convert the above T-SQL to Pyspark.

I have the following dataframe

df = df1.createOrReplaceTempView('airports')

My attempt at converting the above is as follows:

sql("""IF NOT EXISTS(SELECT * FROM airports  where table = airports and COLUMN = 'airport_region') select * from airports""")

The above gives me a ParseException: error.

Any thoughts?

Upvotes: 0

Views: 711

Answers (2)

Rakesh Govindula
Rakesh Govindula

Reputation: 11529

I have reproduced the above and if you want to do it in Pyspark you can use the above query and inside if, execute the SQL script.

df.createOrReplaceTempView("sample1")
if('name1' not in df.columns):
    spark.sql("select * from sample1").show()

enter image description here

If you want to do it in SQL query you can try like below.

First get the columns names as dataframe and save as a temporary view and using that view, if the your column name not exists in that select the required table.

column_names=spark.sql("show columns in sample1");
column_names.createOrReplaceTempView("tempcols")
spark.sql("select * from sample1 where not exists (select * from tempcols where col_name='name1')").show()

enter image description here

If column exists:

enter image description here

Upvotes: 1

ARCrow
ARCrow

Reputation: 1857

Try this:

if('airport_region' not in df1.columns):
    <do stuff>

Upvotes: 0

Related Questions