Emiliano
Emiliano

Reputation: 357

ERROR SparkContext Failed to add file in Apache Spark 2.1.1

I've being using Apache Spark for quite awhile now, but now I'm having an error that never happened before when executing the following example (I've just updated to Spark 2.1.1):

./opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/bin/run-example SparkPi

Here is the actual stacktrace:

    17/07/05 10:50:54 ERROR SparkContext: Failed to add file:/opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-warehouse/ to Spark environment
java.lang.IllegalArgumentException: Directory /opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-warehouse is not allowed for addJar
        at org.apache.spark.SparkContext.liftedTree1$1(SparkContext.scala:1735)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1729)
        at org.apache.spark.SparkContext$$anonfun$11.apply(SparkContext.scala:466)
        at org.apache.spark.SparkContext$$anonfun$11.apply(SparkContext.scala:466)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:466)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
        at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
        at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Pi is roughly 3.1433757168785843

Don't know if it is indeed an error or if I'm missing something, because the example is executed anyway, you can see the Pi is roughly... result at the end.

Here are the configuration lines for spark-env.sh:

export SPARK_MASTER_IP=X.X.X.X
export SPARK_MASTER_WEBUI_PORT=YYYY
export SPARK_WORKER_CORES=4
export SPARK_WORKER_MEMOiRY=7g

Here are the configuration lines for spark-defaults.sh:

spark.master local[*]
spark.driver.cores 4
spark.driver.memory 2g
spark.executor.cores 4
spark.executor.memory 4g
spark.ui.showConsoleProgress false
spark.driver.extraClassPath /opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/lib/postgresql-9.4.1207.jar
spark.eventLog.enabled true
spark.eventLog.dir file:///opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/logs
spark.history.fs.logDirectory file:///opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/logs

Apache Spark version: 2.1.1

Java version: 1.8.0_91

Python version: 2.7.5

I've tried configuring it with this, with no success:

spark.sql.warehouse.dir file:///c:/tmp/spark-warehouse

It is weird because when I compile a script and launch it with spark-submit I don't get this error. Didn't find any jira tickets or something.

Upvotes: 4

Views: 5123

Answers (2)

barath
barath

Reputation: 842

I had a similar issue with my Java Spark code. Even though your issue is in Python-Spark maybe this might help you / someone.

I've to specify some dependency jars to spark using --jar option. Initially I gave the path (i.e. --jars <path-to-dependency>/) to directory (that contains all the dependency jars) and I got the above error.

The --jars option (of spark-submit) seems to accept path only to actual jar(s) (<path-to-directory>/<name>.jar) instead of the just the directory path (<path-to-directory>/).

The issue resolved for me when I moved all the dependency into a single dependency jar and specify that to the --jar option as below

bash ~/spark/bin/spark-submit --class "<class-name>" --jars '<path-to-dependency-jars/<dependency-jar>.jar' --master local <dependency-jar>.jar <input-val1> <input-vale2>

Upvotes: 2

ahidri
ahidri

Reputation: 129

Somewhere in the code, it's telling the SparkContext to add /opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-warehouse as a jar. This not allowed and it throws a java.lang.IllegalArgumentException.

You can see this at the line 1812 of the SparkContext.scala class. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala

Upvotes: 0

Related Questions