Reputation: 65

Running Spark Job on Zeppelin

I have written a custom spark library in scala. I am able to run this successfully as a spark-submit step by spawning the cluster and running the following commands. Here I first get my 2 jars by -

aws s3 cp s3://jars/RedshiftJDBC42-1.2.10.1009.jar .
aws s3 cp s3://jars/CustomJar .

and then i run my spark job as

spark-submit --deploy-mode client --jars RedshiftJDBC42-1.2.10.1009.jar --packages com.databricks:spark-redshift_2.11:3.0.0-preview1,com.databricks:spark-avro_2.11:3.2.0 --class com.activities.CustomObject CustomJar.jar

This runs my CustomObject successfully. I want to run the similar thing in Zeppelin, But I do not know how to add jars and then run a spark-submit step?

Upvotes: 2

Answers (2)

Thomas Decaux

Reputation: 22711

It depend how you run Spark. Most of the time, the Zeppelin interpreter will embed the Spark driver.

The solution is to configure the Zeppelin interpreter instead:

ZEPPELIN_INTP_JAVA_OPTS will configure java options SPARK_SUBMIT_OPTIONS will configure spark options

Upvotes: 0

Victor

Reputation: 2546

You can add these dependencies to the Spark interpreter within Zeppelin:

Go to "Interpreter"
Choose edit and add the jar file
Restart the interpreter

More info here

EDIT You might also want to use the %dep paragraph in order to access the zvariable (which is an implicit Zeppeling context) in order to do something like this:

%dep
z.load("/some_absolute_path/myjar.jar")

Upvotes: 2

Running Spark Job on Zeppelin

Answers (2)

Related Questions