Reputation: 69
I am new the scala and SBT build files. From the introductory tutorials adding spark dependencies to a scala project should be straight-forward via the sbt-spark-package plugin but I am getting the following error:
[error] (run-main-0) java.lang.NoClassDefFoundError: org/apache/spark/SparkContext
Please provide resources to learn more about what could be driving error as I want to understand process more thoroughly.
CODE:
trait SparkSessionWrapper {
lazy val spark: SparkSession = {
SparkSession
.builder()
.master("local")
.appName("spark citation graph")
.getOrCreate()
}
val sc = spark.sparkContext
}
import org.apache.spark.graphx.GraphLoader
object Test extends SparkSessionWrapper {
def main(args: Array[String]) {
println("Testing, testing, testing, testing...")
var filePath = "Desktop/citations.txt"
val citeGraph = GraphLoader.edgeListFile(sc, filepath)
println(citeGraph.vertices.take(1))
}
}
plugins.sbt
resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.6")
build.sbt -- WORKING. Why does libraryDependencies run/work ?
spName := "yewno/citation_graph"
version := "0.1"
scalaVersion := "2.11.12"
sparkVersion := "2.2.0"
sparkComponents ++= Seq("core", "sql", "graphx")
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.2.0",
"org.apache.spark" %% "spark-sql" % "2.2.0",
"org.apache.spark" %% "spark-graphx" % "2.2.0"
)
build.sbt -- NOT WORKING. Would expect this to compile & run correctly
spName := "yewno/citation_graph"
version := "0.1"
scalaVersion := "2.11.12"
sparkVersion := "2.2.0"
sparkComponents ++= Seq("core", "sql", "graphx")
Bonus for explanation + links to resources to learn more about SBT build process, jar files, and anything else that can help me get up to speed!
Upvotes: 0
Views: 2158
Reputation: 48420
sbt-spark-package plugin provides dependencies in provided
scope:
sparkComponentSet.map { component =>
"org.apache.spark" %% s"spark-$component" % sparkVersion.value % "provided"
}.toSeq
We can confirm this by running show libraryDependencies
from sbt:
[info] * org.scala-lang:scala-library:2.11.12
[info] * org.apache.spark:spark-core:2.2.0:provided
[info] * org.apache.spark:spark-sql:2.2.0:provided
[info] * org.apache.spark:spark-graphx:2.2.0:provided
provided
scope means:
The dependency will be part of compilation and test, but excluded from the runtime.
Thus sbt run
throws java.lang.NoClassDefFoundError: org/apache/spark/SparkContext
If we really want to include provided
dependencies on run
classpath then @douglaz suggests:
run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated
Upvotes: 2