Spark: Monitoring a cluster mode application

Question

Right now I'm using spark-submit to launch an application in cluster mode. The response from the master server gives a json object with a submissionId which I use to identify the application and kill it if necessary. However, I haven't found a simple way to retrieve the worker rest url from the master server response or the driver id (probably could web scrape the master web ui but that would be ugly). Instead, I have to wait until the application finishes, then look up the application statistics from the history server.

Is there any way to use the driver-id to identify the worker url from an application deployed in cluster mode (usually at worker-node:4040)?

16/08/12 11:39:47 INFO RestSubmissionClient: Submitting a request to launch an application in spark://192.yyy:6066.
16/08/12 11:39:47 INFO RestSubmissionClient: Submission successfully created as driver-20160812114003-0001. Polling submission state...
16/08/12 11:39:47 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160812114003-0001 in spark://192.yyy:6066.
16/08/12 11:39:47 INFO RestSubmissionClient: State of driver driver-20160812114003-0001 is now RUNNING.
16/08/12 11:39:47 INFO RestSubmissionClient: Driver is running on worker worker-20160812113715-192.xxx-46215 at 192.xxx:46215.
16/08/12 11:39:47 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
    "action" : "CreateSubmissionResponse",
    "message" : "Driver successfully submitted as driver-20160812114003-0001",
    "serverSparkVersion" : "1.6.1",
    "submissionId" : "driver-20160812114003-0001",
    "success" : true
}

EDIT: Here's what a typical output looks like with log4j console output at DEBUG

Spark-submit command:

./apps/spark-2.0.0-bin-hadoop2.7/bin/spark-submit --master mesos://masterurl:7077 
    --verbose --class MainClass --deploy-mode cluster
    ~/path/myjar.jar args

Spark-submit output:

Using properties file: null
Parsed arguments:
  master                  mesos://masterurl:7077
  deployMode              cluster
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               MyApp
  primaryResource         file:/path/myjar.jar
  name                    MyApp
  childArgs               [args]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file null:



Main class:
org.apache.spark.deploy.rest.RestSubmissionClient
Arguments:
file:/path/myjar.jar
MyApp
args
System properties:
SPARK_SUBMIT -> true
spark.driver.supervise -> false
spark.app.name -> MyApp
spark.jars -> file:/path/myjar.jar
spark.submit.deployMode -> cluster
spark.master -> mesos://masterurl:7077
Classpath elements:



16/08/17 13:26:49 INFO RestSubmissionClient: Submitting a request to launch an application in mesos://masterurl:7077.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Sending POST request to server at http://masterurl:7077/v1/submissions/create:
{
  "action" : "CreateSubmissionRequest",
  "appArgs" : [ args ],
  "appResource" : "file:/path/myjar.jar",
  "clientSparkVersion" : "2.0.0",
  "environmentVariables" : {
    "SPARK_SCALA_VERSION" : "2.10"
  },
  "mainClass" : "SimpleSort",
  "sparkProperties" : {
    "spark.jars" : "file:/path/myjar.jar",
    "spark.driver.supervise" : "false",
    "spark.app.name" : "MyApp",
    "spark.submit.deployMode" : "cluster",
    "spark.master" : "mesos://masterurl:7077"
  }
}
16/08/17 13:26:49 DEBUG RestSubmissionClient: Response from the server:
{
  "action" : "CreateSubmissionResponse",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}
16/08/17 13:26:49 INFO RestSubmissionClient: Submission successfully created as driver-20160817132658-0004. Polling submission state...
16/08/17 13:26:49 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160817132658-0004 in mesos://masterurl:7077.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Sending GET request to server at http://masterurl:7077/v1/submissions/status/driver-20160817132658-0004.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Response from the server:
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "RUNNING",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}
16/08/17 13:26:49 INFO RestSubmissionClient: State of driver driver-20160817132658-0004 is now RUNNING.
16/08/17 13:26:49 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}

arbazkhan002 · Accepted Answer

Does the master server's response not provide application-id?

I believe all you need is the master-URL and application-id of your application for this problem. Once you have the application-id, use the port 4040 at master-URL and append your intended endpoint to it.

For example, if your application id is application_1468141556944_1055

To get the list of all jobs

http://:4040/api/v1/applications/application_1468141556944_1055/jobs

To get the list of stored RDDs

http://:4040/api/v1/applications/application_1468141556944_1055/storage/rdd

However if you don't have application-id, I would probably start with following:

Set verbose mode (--verbose) while launching spark job to get application id on console. You can then parse for application-id in log output. The log output usually looks like:

16/08/12 08:50:53 INFO Client: Application report for application_1468141556944_3791 (state: RUNNING)

thus, application-id is application_1468141556944_3791

You can also find master-url and application-id through tracking URL in the log output, which looks like

    client token: N/A
    diagnostics: N/A
    ApplicationMaster host: 10.50.0.33
    ApplicationMaster RPC port: 0
    queue: ns_debug
    start time: 1470992969127
    final status: UNDEFINED
    tracking URL: http://:8088/proxy/application_1468141556944_3799/

These messages are at INFO log level so make sure you set log4j.rootCategory=INFO, console in log4j.properties file so that you can see them.

Spark: Monitoring a cluster mode application

Answers (2)

Related Questions