Reputation: 1483
I'm building an Apache Spark application in Scala and I'm using SBT to build it. Here is the thing:
sbt test
, I want Spark dependencies to be included in the classpath (same as #1 but from the SBT)To match constraint #2, I'm declaring Spark dependencies as provided
:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
...
)
Then, sbt-assembly's documentation suggests to add the following line to include the dependencies for unit tests (constraint #3):
run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run))
That leaves me with constraint #1 not being full-filled, i.e. I cannot run the application in IntelliJ IDEA as Spark dependencies are not being picked up.
With Maven, I was using a specific profile to build the uber JAR. That way, I was declaring Spark dependencies as regular dependencies for the main profile (IDE and unit tests) while declaring them as provided
for the fat JAR packaging. See https://github.com/aseigneurin/kafka-sandbox/blob/master/pom.xml
What is the best way to achieve this with SBT?
Upvotes: 44
Views: 26176
Reputation: 10236
For running the spark jobs, you can use the "provided" dependencies: https://stackoverflow.com/a/21803413/1091436
You can then run the app from either sbt
, or Intellij IDEA, or anything else.
Example in sbt
:
run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated,
runMain in Compile := Defaults.runMainTask(fullClasspath in Compile, runner in(Compile, run)).evaluated
Upvotes: 4
Reputation: 3376
[Obsolete] See new answer "Use the new 'Include dependencies with "Provided" scope' in an IntelliJ configuration." answer.
The easiest way to add provided
dependencies to debug a task with IntelliJ
is to:
src/main/scala
Mark Directory as...
> Test Sources Root
This tells IntelliJ
to treat src/main/scala
as a test folder for which it adds all the dependencies tagged as provided
to any run config (debug/run).
Every time you do a SBT refresh, redo these step as IntelliJ will reset the folder to a regular source folder.
Upvotes: 2
Reputation: 3376
Use the new 'Include dependencies with "Provided" scope' in an IntelliJ configuration.
Upvotes: 25
Reputation: 11275
The main trick here is to create another subproject that will depend on the main subproject and will have all its provided libraries in compile scope. To do this I add the following lines to build.sbt:
lazy val mainRunner = project.in(file("mainRunner")).dependsOn(RootProject(file("."))).settings(
libraryDependencies ++= spark.map(_ % "compile")
)
Now I refresh project in IDEA and slightly change previous run configuration so it will use new mainRunner module's classpath:
Works flawlessly for me.
Upvotes: 4
Reputation: 1483
(Answering my own question with an answer I got from another channel...)
To be able to run the Spark application from IntelliJ IDEA, you simply have to create a main class in the src/test/scala
directory (test
, not main
). IntelliJ will pick up the provided
dependencies.
object Launch {
def main(args: Array[String]) {
Main.main(args)
}
}
Thanks Matthieu Blanc for pointing that out.
Upvotes: 19
Reputation: 101
A solution based on creating another subproject for running the project locally is described here.
Basically, you would need to modifiy the build.sbt
file with the following:
lazy val sparkDependencies = Seq(
"org.apache.spark" %% "spark-streaming" % sparkVersion
)
libraryDependencies ++= sparkDependencies.map(_ % "provided")
lazy val localRunner = project.in(file("mainRunner")).dependsOn(RootProject(file("."))).settings(
libraryDependencies ++= sparkDependencies.map(_ % "compile")
)
And then run the new subproject locally with Use classpath of module: localRunner
under the Run Configuration.
Upvotes: 2
Reputation: 421
Why not bypass sbt and manually add spark-core and spark-streaming as libraries to your module dependencies?
org.apache.spark:spark-core_2.10:1.6.1
Upvotes: -2
Reputation: 5213
You should be not looking at SBT for an IDEA specific setting. First of all, if the program is supposed to be run with spark-submit, how are you running it on IDEA ? I am guessing you'd be running as standalone in IDEA, while running it through spark-submit normally. If that's the case, add manually the spark libraries in IDEA, using File|Project Structure|Libraries. You'll see all dependencies listed from SBT, but you can add arbitrary jar/maven artifacts using the + (plus) sign. That should do the trick.
Upvotes: 0