Reputation: 2821
I have the following code in T-SQL
IF NOT EXISTS ( SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'airports' AND COLUMN_NAME = 'airport_region') SELECT * FROM airports;
I would like to convert the above T-SQL to Pyspark.
I have the following dataframe
df = df1.createOrReplaceTempView('airports')
My attempt at converting the above is as follows:
sql("""IF NOT EXISTS(SELECT * FROM airports where table = airports and COLUMN = 'airport_region') select * from airports""")
The above gives me a ParseException: error.
Any thoughts?
Upvotes: 0
Views: 711
Reputation: 11529
I have reproduced the above and if you want to do it in Pyspark you can use the above query and inside if, execute the SQL script.
df.createOrReplaceTempView("sample1")
if('name1' not in df.columns):
spark.sql("select * from sample1").show()
If you want to do it in SQL query you can try like below.
First get the columns names as dataframe and save as a temporary view and using that view, if the your column name not exists in that select the required table.
column_names=spark.sql("show columns in sample1");
column_names.createOrReplaceTempView("tempcols")
spark.sql("select * from sample1 where not exists (select * from tempcols where col_name='name1')").show()
If column exists:
Upvotes: 1