Reputation: 133
With a fresh install of Spark 2.1, I am getting an error when executing the pyspark command.
Traceback (most recent call last):
File "/usr/local/spark/python/pyspark/shell.py", line 43, in <module>
spark = SparkSession.builder\
File "/usr/local/spark/python/pyspark/sql/session.py", line 179, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/local/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
I have Hadoop and Hive on the same machine. Hive is configured to use MySQL for the metastore. I did not get this error with Spark 2.0.2.
Can someone please point me in the right direction?
Upvotes: 9
Views: 35239
Reputation: 1
Project location and file permissions would be issue. I have observed this error happening inspite of changes to my pom file.Then i changed the directory of my project to user directory where i have full permissions, this solved my issue.
Upvotes: 0
Reputation: 1373
I have removed ".enableHiveSupport()\" from shell.py file and its working perfect
/*****Before********/ spark = SparkSession.builder\ .enableHiveSupport()\ .getOrCreate()
/*****After********/
spark = SparkSession.builder\ .getOrCreate()
/*************************/
Upvotes: 0
Reputation: 785
I was getting this error trying to run pyspark and spark-shell when my HDFS wasn't started.
Upvotes: 0
Reputation: 23
I too was struggling in cluster mode. Added hive-site.xml from sparkconf directory, if you have hdp cluster then it should be at /usr/hdp/current/spark2-client/conf. Its working for me.
Upvotes: 0
Reputation: 4938
I saw this error on a new (2018) Mac, which came with Java 10. The fix was to set JAVA_HOME
to Java 8:
export JAVA_HOME=`usr/libexec/java_home -v 1.8`
Upvotes: 0
Reputation: 41987
The issue for me was solved by disabling HADOOP_CONF_DIR environment variable. It was pointing to hadoop configuration directory and while starting pyspark
shell, the variable caused spark to initiate hadoop cluster which wasn't initiated.
So if you have HADOOP_CONF_DIR variable enabled, then you have to start hadoop cluster started before using spark shells
Or you need to disable the variable.
Upvotes: 3
Reputation: 1033
I was getting same error in windows environment and Below trick worked for me.
in shell.py
the spark session is defined with .enableHiveSupport()
spark = SparkSession.builder\
.enableHiveSupport()\
.getOrCreate()
Remove hive support and redefine spark session as below:
spark = SparkSession.builder\
.getOrCreate()
you can find shell.py
in your spark installation folder.
for me it's in "C:\spark-2.1.1-bin-hadoop2.7\python\pyspark"
Hope this helps
Upvotes: 17
Reputation: 557
You are missing the spark-hive jar.
For example, if you are running on Scala 2.11, with Spark 2.1, you can use this jar.
https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.11/2.1.0
Upvotes: 0
Reputation: 994
I had the same problem. Some of the answers sudo chmod -R 777 /tmp/hive/
, or to downgrade spark with hadoop to 2.6 didn't work for me.
I realized that what caused this problem for me is that I was doing SQL queries using the sqlContext instead of using the sparkSession.
sparkSession =SparkSession.builder.master("local[*]").appName("appName").config("spark.sql.warehouse.dir", "./spark-warehouse").getOrCreate()
sqlCtx.registerDataFrameAsTable(..)
df = sparkSession.sql("SELECT ...")
this perfectly works for me now.
Upvotes: 12
Reputation: 111
Spark 2.1.0 - When I run it with yarn client option - I don't see this issue, but yarn cluster mode gives "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':".
Still looking for answer.
Upvotes: 4