Arvind Kumar
Arvind Kumar

Reputation: 1335

Hive Error : Unsupported language features in query in Spark execution

Spark SQL Hive error for NOT EXISTS clause in sql query.

Platform : cdh5.6.0

Hive version: Hive 1.1.0

The below NOT EXISTS query is running fine in hive prompt: SELECT a,b,c,d FROM interim_t WHERE NOT EXISTS (SELECT a FROM xyz_n ABC where (a=a) AND (b=b) AND (c=c)

But the same program is giving error "Unsupported language features in query" in spark execution.

from pyspark import SparkContext
sc =SparkContext()


from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

#sqlContext.sql("INSERT INTO abc_p PARTITION (SRVC_TYPE_CD='1') SELECT a,b,c,d FROM interim_t WHERE NOT EXISTS (SELECT a FROM xyz_n ABC where (a=a) AND (b=b) AND (c=c)")

Execution:

spark-submit --verbose --deploy-mode client /data/abc.py

Error message:

Unsupported language features in query: INSERT INTO abc_p PARTITION (SRVC_TYPE_CD='1') SELECT a,b,c,d FROM interim_t WHERE NOT EXISTS (SELECT a FROM xyz_n ABC where (a=a) AND (b=b) AND (c=c))

I think sqlContext.sql is not supporting NOT EXISTS in hive queries. Could you please suggest some solution/alternatives.

Upvotes: 0

Views: 3211

Answers (2)

Rakesh Kumar
Rakesh Kumar

Reputation: 4420

I had same problem, below solution worked for me. Put these lines in your file and test:-

from pyspark import SparkContext
sc =SparkContext()
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

df = sqlContext.sql("SELECT a,b,c,d FROM interim_t WHERE NOT EXISTS (SELECT a FROM xyz_n ABC where (a=a) AND (b=b) AND (c=c)")
df.write.mode("overwrite").partitionBy("SRVC_TYPE_CD").saveAsTable("abc_p")

Apart from this you can try some more options like mode can be append. You can choose format for saving too. Like mode("xxx").format("parquet"). Format can be parquet, orc etc.

Upvotes: 0

vikrame
vikrame

Reputation: 485

I tried below on pyspark shell, executed just fine with no errors.

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

sqlContext.sql("SELECT a,b,c,d FROM table1 i WHERE  NOT EXISTS  (SELECT a FROM table2 x where i.a=x.a AND i.b=x.b AND i.c=x.c)");

I have following content in test.py

from pyspark import SparkContext
sc =SparkContext()

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

sqlContext.sql("SELECT a,b,c,d FROM table1 i WHERE  NOT EXISTS  (SELECT a FROM table2 x where i.a=x.a AND i.b=x.b AND i.c=x.c)");

Executed the script

spark-submit --verbose --deploy-mode client test.py

Executed successfully. can you give a try ?

My set up Hive 2.1.0 and Spark 2.0.2

I suspect your hive version is the issue

Upvotes: 0

Related Questions