Reputation: 10030
Currently our project is on MR and we use Oozie to orchestrate our MR Jobs. Now we are moving to Spark, and would like to know the recommended ways to schedule/trigger Spark Jobs on the CDH cluster. Note that CDH Oozie does not support Spark2 Jobs. So please give an alternative for this.
Upvotes: 0
Views: 421
Reputation: 191743
Last time I looked, Hue had a Spark option in the Worlflow editor. If Cloudera didn't support that, I'm not sure why it'd be there...
CDH Oozie does support plain shell scripts, though, but you need to be sure all NodeManagers will have spark-submit
command available on the local server.
If that doesn't work, it also supports Java actions for running a JAR, so you could write your Spark scripts all starting with a main method that loads up any configuration from there
Upvotes: 1
Reputation: 1588
As soon as you submit the spark job from the shell, like:
spark-submit <script_path> <arguments_list>
it gets submitted to the CDH cluster. Immediately you will be able to see the spark jobs and its progress in the Hue.This is how we trigger the spark jobs.
Further, to orchestrate a series of jobs, you can use a shell script wrapper around it. Or, you can use a cron job to trigger in timing.
Upvotes: 0