dommer
dommer

Reputation: 19820

ClassNotFoundException when submitting JAR to Spark via spark-submit

I'm struggling to submit a JAR to Apache Spark using spark-submit.

To make things easier, I've experimented using this blog post. The code is

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object SimpleScalaSpark { 
  def main(args: Array[String]) {
    val logFile = "/Users/toddmcgrath/Development/spark-1.6.1-bin-hadoop2.4/README.md" // I've replaced this with the path to an existing file
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local[*]")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

I'm running building this with Intellij Idea 2017.1 and running on Spark 2.1.0. Everything is running fine when I run it in the IDE.

I then build it as a JAR and attempt to use spark-submit as follows

./spark-submit --class SimpleScalaSpark --master local[*] ~/Documents/Spark/Scala/supersimple/out/artifacts/supersimple_jar/supersimple.jar

This results in the following error

java.lang.ClassNotFoundException: SimpleScalaSpark
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I'm at a loss as to what I'm missing...especially given that it runs as expected in the IDE.

Upvotes: 4

Views: 2906

Answers (4)

cerebrotecnologico
cerebrotecnologico

Reputation: 247

I am observing ClassNotFound on new classes I introduce. I am using a fat jar. I verified that the JAR file contains the new class file in all the copies in each node. (I am using the regular filesystem to load the Spark application, not hdfs nor an http URL). The JAR file loaded by the worker did not have the new class I introduced. It is an older version. The only way I found to get around the problem is to use a different filename for the JAR every time that I call spark-submit script.

Upvotes: 0

shants
shants

Reputation: 612

Looks like there is an issue with your jar. You can check what classes are present in your jar by using the command: vi supersimple.jar

If SimpleScalaSpark class does not appear in the output of the previous command, that means your jar is not built properly.

Upvotes: 1

Sergio Alyoshkin
Sergio Alyoshkin

Reputation: 212

IDEs work differently from shell in many ways. I believe for shell you need to add --jars parameter

spark submit add multiple jars in classpath

Upvotes: 0

Lovish chaudhary
Lovish chaudhary

Reputation: 101

As per your description above ,You are not giving the correct class name, so it is not able to find that class.

Just replace SimpleSparkScala with SimpleScalaSpark

Try running this command:

./spark-submit --class SimpleScalaSpark --master local[*] ~/Documents/Spark/Scala/supersimple/out/artifacts/supersimple_jar/supersimple.jar

Upvotes: 1

Related Questions