schedule jobs in programatic way prevent duplicate jobs

Question

My sensor data is captured in hive tables and I want run spark jobs on those in regular intervals of time. Lets say 15 mins 30 mins 45 minute jobs.

We are using cron scheduler to schedule the jobs(different spark-submits) in fixed intervals of time. The problem here is due to yarn resource contention issues jobs are running slow and cron is continuously triggering the same jobs again and again.

For example : 30 mins jobs were triggered and it was delayed due to some cluster resource issues, cron is triggering another 30 mins job for every 30 minutes.

May be one way is to address this is using quarz/oozie scheduler actions.

Is there any programmatic approach to ensure that one job with the same job name completes then only next job with the same name should trigger?

What is the best way to schedule them ?

schedule jobs in programatic way prevent duplicate jobs

Answers (1)

Related Questions