Reputation: 1335
Spark SQL Hive error for NOT EXISTS clause in sql query.
Platform : cdh5.6.0
Hive version: Hive 1.1.0
The below NOT EXISTS query is running fine in hive prompt: SELECT a,b,c,d FROM interim_t WHERE NOT EXISTS (SELECT a FROM xyz_n ABC where (a=a) AND (b=b) AND (c=c)
But the same program is giving error "Unsupported language features in query" in spark execution.
from pyspark import SparkContext
sc =SparkContext()
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
#sqlContext.sql("INSERT INTO abc_p PARTITION (SRVC_TYPE_CD='1') SELECT a,b,c,d FROM interim_t WHERE NOT EXISTS (SELECT a FROM xyz_n ABC where (a=a) AND (b=b) AND (c=c)")
Execution:
spark-submit --verbose --deploy-mode client /data/abc.py
Error message:
Unsupported language features in query: INSERT INTO abc_p PARTITION (SRVC_TYPE_CD='1') SELECT a,b,c,d FROM interim_t WHERE NOT EXISTS (SELECT a FROM xyz_n ABC where (a=a) AND (b=b) AND (c=c))
I think sqlContext.sql is not supporting NOT EXISTS in hive queries. Could you please suggest some solution/alternatives.
Upvotes: 0
Views: 3211
Reputation: 4420
I had same problem, below solution worked for me. Put these lines in your file and test:-
from pyspark import SparkContext
sc =SparkContext()
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
df = sqlContext.sql("SELECT a,b,c,d FROM interim_t WHERE NOT EXISTS (SELECT a FROM xyz_n ABC where (a=a) AND (b=b) AND (c=c)")
df.write.mode("overwrite").partitionBy("SRVC_TYPE_CD").saveAsTable("abc_p")
Apart from this you can try some more options like mode
can be append
. You can choose format for saving too. Like mode("xxx").format("parquet")
. Format can be parquet, orc
etc.
Upvotes: 0
Reputation: 485
I tried below on pyspark shell, executed just fine with no errors.
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
sqlContext.sql("SELECT a,b,c,d FROM table1 i WHERE NOT EXISTS (SELECT a FROM table2 x where i.a=x.a AND i.b=x.b AND i.c=x.c)");
I have following content in test.py
from pyspark import SparkContext
sc =SparkContext()
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
sqlContext.sql("SELECT a,b,c,d FROM table1 i WHERE NOT EXISTS (SELECT a FROM table2 x where i.a=x.a AND i.b=x.b AND i.c=x.c)");
Executed the script
spark-submit --verbose --deploy-mode client test.py
Executed successfully. can you give a try ?
My set up Hive 2.1.0 and Spark 2.0.2
I suspect your hive version is the issue
Upvotes: 0