MozenRath
MozenRath

Reputation: 10030

How to schedule/trigger spark jobs in Cloudera?

Currently our project is on MR and we use Oozie to orchestrate our MR Jobs. Now we are moving to Spark, and would like to know the recommended ways to schedule/trigger Spark Jobs on the CDH cluster. Note that CDH Oozie does not support Spark2 Jobs. So please give an alternative for this.

Upvotes: 0

Views: 421

Answers (2)

OneCricketeer
OneCricketeer

Reputation: 191743

Last time I looked, Hue had a Spark option in the Worlflow editor. If Cloudera didn't support that, I'm not sure why it'd be there...

CDH Oozie does support plain shell scripts, though, but you need to be sure all NodeManagers will have spark-submit command available on the local server.

If that doesn't work, it also supports Java actions for running a JAR, so you could write your Spark scripts all starting with a main method that loads up any configuration from there

Upvotes: 1

Jim Todd
Jim Todd

Reputation: 1588

As soon as you submit the spark job from the shell, like: spark-submit <script_path> <arguments_list> it gets submitted to the CDH cluster. Immediately you will be able to see the spark jobs and its progress in the Hue.This is how we trigger the spark jobs.

Further, to orchestrate a series of jobs, you can use a shell script wrapper around it. Or, you can use a cron job to trigger in timing.

Upvotes: 0

Related Questions