duckertito
duckertito

Reputation: 3635

How to schedule the execution of spark-submit to specific time

I have a Spark batch processing code (basically, the model training) that I execute with spark-submit from AWS EMR cluster. Now I want to be able to launch this job each day at specific time. What is the standard way to do it? Should I change the code and add the scheduling inside the code? Or is there any way to schedule spark-submit job? Or maybe should I make it as a Spark Streaming job executed every 24 hours? (though I am interested in a specific time slot, i.e. between 11:00pm and 12pm)

Upvotes: 2

Views: 2748

Answers (2)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29185

Cron is more traditional... although it is good, Another way/option is RunDeck.

Use Rundeck as an easier to manage and more secure replacement for Cron or as a replacement for legacy tools like Control-M or HP Operations Orchestration. Rundeck gives your users a simple web interface (GUI or API) to go to for both on-demand and scheduled operations tasks.

What is Rundeck?

Rundeck is open source software that helps you automate routine operational procedures in data center or cloud environments. Rundeck provides a number of features that will alleviate time-consuming grunt work and make it easy for you to scale up your automation efforts and create self service for others. Teams can collaborate to share how processes are automated while others are given trust to view operational activity or execute tasks.

Rundeck allows you to run tasks on any number of nodes from a web-based or command-line interface. Rundeck also includes other features that make it easy to scale up your automation efforts including: access control, workflow building, scheduling, logging, and integration with external sources for node and option data.

enter image description here enter image description here

Upvotes: 1

ulrich
ulrich

Reputation: 3587

If you are using Linux you can setup a Cron job to call the spark-submit script http://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/

Upvotes: 1

Related Questions