YACINE GACI
YACINE GACI

Reputation: 145

Difference in running a spark application with sbt run or with spark-submit script

I am new to Spark and as I am learning this framework, I figured out that, to the best of my knowledge, there are two ways for running a spark application when written in Scala:

  1. Package the project into a JAR file, and then run it with the spark-submit script.
  2. Running the project directly with sbt run.

I am wondering what the difference between those two modes of execution could be, especially when running with sbt run can throw a java.lang.InterruptedException when it runs perfectly with spark-submit.

Thanks!

Upvotes: 2

Views: 2298

Answers (2)

Harjeet Kumar
Harjeet Kumar

Reputation: 524

Spark Sbt and Spark-submit are 2 completely different Things

  1. sbt is build tool. If you have created a spark application, sbt will help you compile that code and create a jar file with required dependencies etc.
  2. Spark-submit is used to submit spark job to cluster manager. You may be using standalone, Mesos or Yarn as your cluster Manager. spark-submit will submit your job to cluster manager and your job will start on cluster.

Hope this helps.

Cheers!

Upvotes: 2

Ged
Ged

Reputation: 18003

SBT is a build tool (that I like running on Linux) that does not necessarily imply Spark usage. It just so happens it is used like IntelliJ for Spark applications.

You can package and run an application in a single JVM under SBT Console, but not at scale. So, if you created a Spark application with dependencies indicated, SBT will compile the code with package and create a jar file with required dependencies etc. to run locally.

You can also use assembly option in SBT which creates an uber jar or fat jar with all dependencies contained in jar that you upload to your cluster and run via invoking spark-submit. So, again, if you created a Spark application with dependencies indicated, SBT will via assembly, compile the code and create an uber jar file with all required dependencies etc., except external file(s) that you need to ship to Workers, to run on your cluster (in general).

Upvotes: 5

Related Questions