Reputation: 345

Apache Spark with Python: error

New to Spark. Downloaded everything alright but when I run pyspark I get the following errors:

Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/02/05 20:46:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\bin\..\python\pyspark\shell.py", line 43, in <module>
spark = SparkSession.builder\
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\python\pyspark\sql\session.py", line 179, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
File "C:\Users\Carolina\spark-2.1.0-bin-hadoop2.7\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\spark-2.1.0-bin-hadoop2.6\python\pyspark\sql\utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"

Also, when I try (as recommended by http://spark.apache.org/docs/latest/quick-start.html)

textFile = sc.textFile("README.md")

I get:

NameError: name 'sc' is not defined

Any advice? Thank you!

Upvotes: 0

Answers (8)

AKSHAY PANDYA

Reputation: 91

You need a "winutils" competable in the hadoop bin directory.

Upvotes: 0

Jin Zhong

Reputation: 1

I come across the error:

raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'

this is because i already run ./bin/spark-shell

So, just kill that spark-shell, and re-run ./bin/pyspark

Upvotes: 0

Huang

Reputation: 41

With my problem like this, because I have set the Hadoop at yarn model, so my solution is to start the hdfs and the YARN.

start-dfs.sh
start-yarn.sh

Upvotes: 0

MrCartoonology

Reputation: 2067

I deleted the metastore_db directory and then things worked. I'm doing some light development on a macbook -- I had run pycharm to sync my directory with the server - I thin it picked up that spark specific directory and messed it up. For my the the error message came when I was trying to start an interactive ipython pyspark shell.

Upvotes: 0

AlessioX

Reputation: 3177

If you're on a Mac and you've installed Spark (and eventually Hive) through Homebrew the answers from @Eric Pettijohn and @user7772046 will not work. The former due to the fact that Homebrew's Spark contains the aforementioned jar file; the latter because, trivially, it is a pure Windows-based solution.

Inspired by this link and the permission issues hint, I've come up with the following simple solution: launch pyspark using sudo. No more Hive-related errors.

Upvotes: 1

user7772046

Reputation: 21

I also encountered this issue on Windows 7 with pre-built Spark 2.2. Here is a possible solution for Windows guys:

make sure you get all the environment path set correctly, including SPARK_PATH, HADOOP_HOME, etc.
get the correct version of winutils.exe for the Spark-Hadoop prebuilt package
then open a cmd prompt as Administration, run this command:

winutils chmod 777 C:\tmp\hive

Note: The drive might be different depending on where you invoke pyspark or spark-shell

This link should take the credit: see the answer by timesking

Upvotes: 2

Eric Pettijohn

Reputation: 66

It looks like you've found the answer to the second part of your question in the above answer, but for future users getting here via the 'org.apache.spark.sql.hive.HiveSessionState' error, this class is found in the spark-hive jar file, which does not come bundled with Spark if it isn't built with Hive.

You can get this jar at:

http://central.maven.org/maven2/org/apache/spark/spark-hive_${SCALA_VERSION}/${SPARK_VERSION}/spark-hive_${SCALA_VERSION}-${SPARK_VERSION}.jar

You'll have to put it into your SPARK_HOME/jars folder, and then Spark should be able to find all of the Hive classes required.

Upvotes: 2

lelabo_m

Reputation: 509

If you are doing it from the pyspark console, it may be because your installation did not work.

If not, it's because most example assume you are testing code in the pyspark console where a default variable 'sc' exist.

You can create a SparkContext by yourself at the beginning of your script using the following code:

from pyspark import SparkContext, SparkConf

conf = SparkConf()
sc = SparkContext(conf=conf)

Upvotes: 3

Apache Spark with Python: error

Answers (8)

Related Questions