huangbiubiu
huangbiubiu

Reputation: 1271

How to get spark SUBMISSION_ID with spark-submit?

Many places need SUBMISSION_ID, like spark-submit --status and Spark REST API. But how can I get this SUBMISSION_ID when I use spark-submit command to submit spark jobs?

P.S.:

I use python [popen][2] to start a spark-submit job. I want SUBMISSION_ID so my python program can monitor spark job status via REST API: <ip>:6066/v1/submissions/status/<SUBMISSION_ID>

Upvotes: 1

Views: 5509

Answers (1)

huangbiubiu
huangbiubiu

Reputation: 1271

Thanks to the clue by @Pandey. The answer https://stackoverflow.com/a/37980813/5634636 helps me a lot.

TL;DR

Detailed description

NOTE: I only test my approaches on Apache Spark 2.3.1. I can't guarantee that it will work in other versions as well.

Let's clear my requirement first. There're 3 features I wanted:

  1. remote submit a spark job
  2. check job status anytime (RUNNING, ERROR, FINISHED...)
  3. get the error message if there is something error

Submit locally

NOTE: this answer only works in cluster mode

The Spark tool spark-submit will help.

Submit remotely

Spark submission API is recommended. It seems that there is not any documentation on Apache Spark official website, so some people call it hidden API. For details, see: https://www.nitendragautam.com/spark/submit-apache-spark-job-with-rest-api/

  • To submit Spark job, use submit API
  • To get the status of the job, use status API: http://<master-ip>:6066/v1/submissions/status/<submission-id>. The submission-id will be returned in a json when you submit jobs.
  • The error message is included in the status message.
  • More about the error message: note the difference between status ERROR and FAILED. In short, the FAILED means that there is something wrong during executing Spark jobs (e.g. uncaught exceptions), while the ERROR means there's something error during submitting (e.g. invalid jar path). The error message is included in the status json. If you want to view the FAILED reason, it can be accessed via http://<driver-ip>:<ui-port>/log/<submission-id>.

Here is an example of error status (**** is an incorrect jar path which is miswritten intentionally):

{
  "action" : "SubmissionStatusResponse",
  "driverState" : "ERROR",
  "message" : "Exception from the cluster:\njava.io.FileNotFoundException: File hdfs:**** does not exist.\n\torg.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795)\n\torg.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)\n\torg.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)\n\torg.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)\n\torg.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)\n\torg.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)\n\torg.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:727)\n\torg.apache.spark.util.Utils$.doFetchFile(Utils.scala:695)\n\torg.apache.spark.util.Utils$.fetchFile(Utils.scala:488)\n\torg.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155)\n\torg.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)\n\torg.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)",
  "serverSparkVersion" : "2.3.1",
  "submissionId" : "driver-20190315160943-0005",
  "success" : true,
  "workerHostPort" : "172.18.0.4:36962",
  "workerId" : "worker-20190306214522-172.18.0.4-36962"
}

Upvotes: 2

Related Questions