Sri
Sri

Reputation: 643

submitting spark Jobs on standalone cluster

How to externally add dependent jars when you are submitting a Spark Job. Also would like to know How to package dependent jars with application Jar.

Upvotes: 2

Views: 2000

Answers (1)

marios
marios

Reputation: 8996

This is a popular question, I looked for some good answer in stackoverflow but I didn't find something that answers this exactly as asked, so I will try to answer this here:


The best way to submit a job is to use the spark-submit script. This assume that you already have a running cluster (distributed or locally, doesn't matter).

You can find this script under $SPARK_HOME/bin/spark-submit.

Here is an example:

spark-submit --name "YourAppNameHere" --class com.path.to.main --master spark://localhost:7077  --driver-memory 1G --conf spark.executor.memory=4g --conf spark.cores.max=100 theUberJar.jar

You give the app a name, you define where your main class is located and the location of spark master (where the cluster runs). You can optionally pass different parameters. The last argument is the name of the uberJar that contains your main and all your dependencies.

The theUberJar.jar relates to your second question on how to package your app. If you are using Scala the best way is to use sbt and create an uber jar using sbt-assembly.

Here are the steps:

  • Create your uber jar using sbt assembly
  • Start the cluster ($SPARK_HOME/sbin/start-all.sh)
  • Submit the App to your running cluster using the uber jar from step 1

Upvotes: 1

Related Questions