Reputation: 964
I want to run a bunch of spark jobs in parallel through yarn, then wait for all of them to finish before launching another set of jobs. How can I find out when my first set of jobs has finished? Thank you.
Upvotes: 1
Views: 1141
Reputation: 5881
Sample Work Around;
Give your spark job a unique name in spark-submit command.
spark-submit --master yarn-cluster --name spark_job_name job1.jar
Check on yarn, spark job is running or not. If not running run your second job. Bash script below
JOB="spark_job_name"
applicationId=$(yarn application -list -appStates RUNNING | awk -v tmpJob=$JOB '{ if( $2 == tmpJob) print $1 }')
if [ ! -z $applicationId ]
then
echo " "
echo "JOB: ${JOB} is already running. ApplicationId : ${applicationId}"
echo " "
else
printf "first job is not running. Starting the spark job. ${JOB}\n"
echo " "
spark-submit --master yarn-cluster --name spark_job_name2 job2.jar
echo " "
fi
Upvotes: 2