Reputation: 793
I want to build a Spark application Jar. My expectation is: when I execute the jar by ./spark-submit
, the application will utilize my own built mllib(ex:spark-mllib_2.11-2.2.0-SNAPSHOT.jar
).
This's my build.sbt
:
name:="SoftmaxMNIST"
version := "1.0"
scalaVersion := "2.11.4"
unmanagedJars in Compile += file("lib/spark-mllib_2.11-2.2.0-SNAPSHOT.jar")
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" %% "spark-sql" % "2.1.0
)
// META-INF discarding
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
}
I already dropped my own-built spark-mllib_2.11-2.2.0-SNAPSHOT.jar
in to /My-Project-Path/lib/
directory. But it does not work. It seems that the application is still using the Spark's default mllib jar
, in my case it is in PATH/spark-2.1.0-bin-hadoop2.7/jars/
directory
PS: The ultimate purpose is that when I run my application on AWS EC2, my application is always using my own-built mllib
instead of the default one. I may modify my own mllib
frequently.
Can anyone help me solve this. Thanks in advance!
Upvotes: 0
Views: 119
Reputation: 74679
The answer depends on how you do spark-submit
. You have to "convince" (aka modify) spark-submit
to see the modified jar (not the one in SPARK_HOME
).
The quickest (not necessarily easiest in the long run) approach would be to include the Spark jars, including the one you've modified, in your uberjar (aka fat jar). You seem to be using sbt-assembly plugin in your sbt project so it's just a matter of publishLocal
the dependency (or putting in into lib
directory) and add it to libraryDependencies
in your project. assemble
will do the rest.
That will however give you a really huge and fat jar that while in heavy development cycle with lots of compilation, testing and deployment could make the process terribly slow.
The other approach is to use your custom Apache Spark (with the modified library for Spark MLlib included). After you mvn install
you'll have your custom Spark ready to use. Use spark-submit
from the custom version and it's supposed to work. You don't have to include the jar in your fat jar and perhaps you won't have to use sbt-assembly plugin whatsoever (just a mere sbt package
should work).
That approach has the benefit of making your deployable Spark application package smaller and the custom Spark stay separate from development process. Use an internal library repository to publish and depend on.
Upvotes: 1