Felipe
Felipe

Reputation: 7633

Why build.sbt Spark 3 application fat jar with dependencies is discarding the my dependencies?

I am using Spark 3 with Scala 2.12.3. My application has some dependencies which I want to include in the Fat jar file. I see one option to build using the sbt-assembly on this link. In order to do this I have to create an project/assembly.sbt file:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")

and my build.sbt file has:

name := "explore-spark"

version := "0.2"

scalaVersion := "2.12.3"

val sparkVersion = "3.0.0"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
  "com.twitter" %% "algebird-core" % "0.13.7",
  "joda-time" % "joda-time" % "2.5",
  "org.fusesource.mqtt-client" % "mqtt-client" % "1.16"
)

mainClass in(Compile, packageBin) := Some("org.sense.spark.app.App")
mainClass in assembly := Some("org.sense.spark.app.App")

assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyJarName in assembly := s"${name.value}_${scalaBinaryVersion.value}-fat_${version.value}.jar"

Then execute the command sbt assembly on the root directory of the project. I get warning messages saying that files are been discarded.

[info] Merging files...
[warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
[warn] Merging 'META-INF/LICENSE.txt' with strategy 'rename'
[warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
[warn] Merging 'META-INF/maven/com.googlecode.javaewah/JavaEWAH/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/com.googlecode.javaewah/JavaEWAH/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/joda-time/joda-time/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/joda-time/joda-time/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtbuf/hawtbuf/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtbuf/hawtbuf/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch-transport/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch-transport/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.hawtdispatch/hawtdispatch/pom.xml' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.mqtt-client/mqtt-client/pom.properties' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.fusesource.mqtt-client/mqtt-client/pom.xml' with strategy 'discard'
[warn] Strategy 'discard' was applied to 13 files
[warn] Strategy 'rename' was applied to 2 files
[info] SHA-1: 2f2a311b8c826caae5f65a3670a71aafa12e2dc7
[info] Packaging /home/felipe/workspace-idea/explore-spark/target/scala-2.12/explore-spark_2.12-fat_0.2.jar ...
[info] Done packaging.
[success] Total time: 13 s, completed Jul 20, 2020 12:44:37 PM

Then when I try to submit my spark application I get the error java.lang.NoClassDefFoundError: org/fusesource/hawtbuf/Buffer. I created the fat jar file but somehow it is discarding the dependencies that I need. This is how I submit the application just to make sure that I am using the fat jar. $ ./bin/spark-submit --master spark://127.0.0.1:7077 --deploy-mode cluster --driver-cores 4 --name "App" --conf "spark.driver.extraJavaOptions=-javaagent:/home/flink/spark-3.0.0-bin-hadoop2.7/jars/jmx_prometheus_javaagent-0.13.0.jar=8082:/home/flink/spark-3.0.0-bin-hadoop2.7/conf/spark.yml" /home/felipe/workspace-idea/explore-spark/target/scala-2.12/explore-spark_2.12-fat_0.2.jar -app 2

Upvotes: 2

Views: 1474

Answers (1)

Bartosz Konieczny
Bartosz Konieczny

Reputation: 2033

You can debug in the following order:

  1. Please ensure that the missing class is well included in your fat jar. Jar is an archive, you can visualize it as it with tools on your OS.
  2. If it's the case, check if it's not already included on the cluster you're using to run the code. If yes, you can use shading as a solution (I explained the approach here: https://www.waitingforcode.com/apache-spark/shading-solution-dependency-hell-spark/read) or bump the dependencies but it's a little bit risky.
  3. If it's not the case, try to include it explicitly - what you did but maybe it's a wrong version?

Upvotes: 1

Related Questions