botkop
botkop

Reputation: 964

how to find out if spark jobs have finished before launching new one

I want to run a bunch of spark jobs in parallel through yarn, then wait for all of them to finish before launching another set of jobs. How can I find out when my first set of jobs has finished? Thank you.

Upvotes: 1

Views: 1141

Answers (1)

Kishore
Kishore

Reputation: 5881

Sample Work Around;

Give your spark job a unique name in spark-submit command.

spark-submit --master yarn-cluster --name spark_job_name job1.jar

Check on yarn, spark job is running or not. If not running run your second job. Bash script below

JOB="spark_job_name"

applicationId=$(yarn application -list -appStates RUNNING | awk -v tmpJob=$JOB '{ if( $2 == tmpJob) print $1 }')

if [ ! -z $applicationId ]
  then
  echo " "
  echo "JOB: ${JOB} is already running. ApplicationId : ${applicationId}"
  echo " "
  else
  printf "first job is not running. Starting the spark job. ${JOB}\n"
  echo " "
  spark-submit --master yarn-cluster --name spark_job_name2 job2.jar
  echo " "
fi

Upvotes: 2

Related Questions