Reputation: 345
New to Spark. Downloaded everything alright but when I run pyspark I get the following errors:
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/02/05 20:46:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\bin\..\python\pyspark\shell.py", line 43, in <module>
spark = SparkSession.builder\
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\python\pyspark\sql\session.py", line 179, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\python\pyspark\sql\utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
Also, when I try (as recommended by http://spark.apache.org/docs/latest/quick-start.html)
textFile = sc.textFile("README.md")
I get:
NameError: name 'sc' is not defined
Any advice? Thank you!
Upvotes: 0
Views: 8120
Reputation: 1
I come across the error:
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'
this is because i already run ./bin/spark-shell
So, just kill that spark-shell
, and re-run ./bin/pyspark
Upvotes: 0
Reputation: 41
With my problem like this, because I have set the Hadoop at yarn model, so my solution is to start the hdfs and the YARN.
start-dfs.sh
start-yarn.sh
Upvotes: 0
Reputation: 2067
I deleted the metastore_db directory and then things worked. I'm doing some light development on a macbook -- I had run pycharm to sync my directory with the server - I thin it picked up that spark specific directory and messed it up. For my the the error message came when I was trying to start an interactive ipython pyspark shell.
Upvotes: 0
Reputation: 3177
If you're on a Mac and you've installed Spark (and eventually Hive) through Homebrew the answers from @Eric Pettijohn and @user7772046 will not work. The former due to the fact that Homebrew's Spark contains the aforementioned jar file; the latter because, trivially, it is a pure Windows-based solution.
Inspired by this link and the permission issues hint, I've come up with the following simple solution: launch pyspark
using sudo
. No more Hive-related errors.
Upvotes: 1
Reputation: 21
I also encountered this issue on Windows 7 with pre-built Spark 2.2. Here is a possible solution for Windows guys:
make sure you get all the environment path set correctly, including SPARK_PATH
, HADOOP_HOME
, etc.
get the correct version of winutils.exe
for the Spark-Hadoop prebuilt package
then open a cmd prompt as Administration, run this command:
winutils chmod 777 C:\tmp\hive
Note: The drive might be different depending on where you invoke pyspark
or spark-shell
This link should take the credit: see the answer by timesking
Upvotes: 2
Reputation: 66
It looks like you've found the answer to the second part of your question in the above answer, but for future users getting here via the 'org.apache.spark.sql.hive.HiveSessionState'
error, this class is found in the spark-hive jar file, which does not come bundled with Spark if it isn't built with Hive.
You can get this jar at:
http://central.maven.org/maven2/org/apache/spark/spark-hive_${SCALA_VERSION}/${SPARK_VERSION}/spark-hive_${SCALA_VERSION}-${SPARK_VERSION}.jar
You'll have to put it into your SPARK_HOME/jars
folder, and then Spark should be able to find all of the Hive classes required.
Upvotes: 2
Reputation: 509
If you are doing it from the pyspark console, it may be because your installation did not work.
If not, it's because most example assume you are testing code in the pyspark console where a default variable 'sc' exist.
You can create a SparkContext by yourself at the beginning of your script using the following code:
from pyspark import SparkContext, SparkConf
conf = SparkConf()
sc = SparkContext(conf=conf)
Upvotes: 3