Reputation: 19820
I'm struggling to submit a JAR to Apache Spark using spark-submit
.
To make things easier, I've experimented using this blog post. The code is
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object SimpleScalaSpark {
def main(args: Array[String]) {
val logFile = "/Users/toddmcgrath/Development/spark-1.6.1-bin-hadoop2.4/README.md" // I've replaced this with the path to an existing file
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[*]")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
I'm running building this with Intellij Idea 2017.1 and running on Spark 2.1.0. Everything is running fine when I run it in the IDE.
I then build it as a JAR and attempt to use spark-submit
as follows
./spark-submit --class SimpleScalaSpark --master local[*] ~/Documents/Spark/Scala/supersimple/out/artifacts/supersimple_jar/supersimple.jar
This results in the following error
java.lang.ClassNotFoundException: SimpleScalaSpark
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I'm at a loss as to what I'm missing...especially given that it runs as expected in the IDE.
Upvotes: 4
Views: 2906
Reputation: 247
I am observing ClassNotFound on new classes I introduce. I am using a fat jar. I verified that the JAR file contains the new class file in all the copies in each node. (I am using the regular filesystem to load the Spark application, not hdfs nor an http URL). The JAR file loaded by the worker did not have the new class I introduced. It is an older version. The only way I found to get around the problem is to use a different filename for the JAR every time that I call spark-submit script.
Upvotes: 0
Reputation: 612
Looks like there is an issue with your jar. You can check what classes are present in your jar by using the command: vi supersimple.jar
If SimpleScalaSpark class does not appear in the output of the previous command, that means your jar is not built properly.
Upvotes: 1
Reputation: 212
IDEs work differently from shell in many ways. I believe for shell you need to add --jars parameter
spark submit add multiple jars in classpath
Upvotes: 0
Reputation: 101
As per your description above ,You are not giving the correct class name, so it is not able to find that class.
Just replace SimpleSparkScala with SimpleScalaSpark
Try running this command:
./spark-submit --class SimpleScalaSpark --master local[*] ~/Documents/Spark/Scala/supersimple/out/artifacts/supersimple_jar/supersimple.jar
Upvotes: 1