Mohanad Kaleia
Mohanad Kaleia

Reputation: 791

How to check status of Spark applications from the command line?

To check running applications in Apache spark, one can check them from the web interface on the URL:

http://<master>:8080

My question how we can check running applications from terminal, is there any command that returns applications status?

Upvotes: 36

Views: 88476

Answers (4)

Lana Nova
Lana Nova

Reputation: 360

I have found that it is possible to use REST API to submit, kill and get status of Spark jobs. The REST API is exposed on master on port 6066.

  1. To create the job, use the following curl command:

    curl -X POST http://spark-cluster-ip:6066/v1/submissions/create 
       --header "Content-Type:application/json;charset=UTF-8"
       --data 
        '{
            "action" : "CreateSubmissionRequest",
            "appArgs" : [ "blah" ],
            "appResource" : "path-to-jar-file",
            "clientSparkVersion" : "2.2.0",
            "environmentVariables" : { "SPARK_ENV_LOADED" : "1" },
            "mainClass" : "app-class",
            "sparkProperties" : { 
                "spark.jars" : "path-to-jar-file",
                "spark.driver.supervise" : "false",
                "spark.app.name" : "app-name",
                "spark.submit.deployMode" : "cluster",
                "spark.master" : "spark://spark-master-ip:6066" 
             }
         }'
    

    The response includes success or failure of the above operation and submissionId

    {
       'submissionId': 'driver-20170829014216-0001',
       'serverSparkVersion': '2.2.0',
       'success': True,
       'message': 'Driver successfully submitted as driver-20170829014216-0001',
       'action': 'CreateSubmissionResponse'
    }
    
  2. To delete the job, use the submissionId obtained above:

     curl -X POST http://spark-cluster-ip:6066/v1/submissions/kill/driver-driver-20170829014216-0001
    

    The response again contains success/failure status:

    {
         'success': True,
         'message': 'Kill request for driver-20170829014216-0001 submitted',
         'action': 'KillSubmissionResponse',
         'serverSparkVersion': '2.2.0',
         'submissionId': 'driver-20170829014216-0001'
    }
    
  3. To get the status, use the following command:

    curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20170829014216-0001
    

    The response includes driver state -- current status of the app:

    {
      "action" : "SubmissionStatusResponse",
      "driverState" : "RUNNING",
      "serverSparkVersion" : "2.2.0",
      "submissionId" : "driver-20170829203736-0004",
      "success" : true,
      "workerHostPort" : "10.32.1.18:38317",
      "workerId" : "worker-20170829013941-10.32.1.18-38317"
    }
    

I found out about REST API here.

Upvotes: 3

Prashant_M
Prashant_M

Reputation: 3134

As in my case, my spark application runs on Amazon's AWS EMR remotely. So I make use of Lynx command line browser to access the spark application's status. While you have submitted your spark job from one terminal, open another terminal and fire the following command from new terminal.

   **lynx http://localhost:<4043 or other spark job port>**

Upvotes: 2

Jacek Laskowski
Jacek Laskowski

Reputation: 74789

If it's for Spark Standalone or Apache Mesos cluster managers, @sb0709's answer is the way to follow.

For YARN, you should use yarn application command:

$ yarn application -help
usage: application
 -appStates <States>             Works with -list to filter applications
                                 based on input comma-separated list of
                                 application states. The valid application
                                 state can be one of the following:
                                 ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN
                                 NING,FINISHED,FAILED,KILLED
 -appTypes <Types>               Works with -list to filter applications
                                 based on input comma-separated list of
                                 application types.
 -help                           Displays help for all commands.
 -kill <Application ID>          Kills the application.
 -list                           List applications. Supports optional use
                                 of -appTypes to filter applications based
                                 on application type, and -appStates to
                                 filter applications based on application
                                 state.
 -movetoqueue <Application ID>   Moves the application to a different
                                 queue.
 -queue <Queue Name>             Works with the movetoqueue command to
                                 specify which queue to move an
                                 application to.
 -status <Application ID>        Prints the status of the application.

Upvotes: 21

n1tk
n1tk

Reputation: 2500

You can use spark-submit --status (as described in Mastering Apache Spark 2.0).

spark-submit --status [submission ID]

See the code of spark-submit for reference:

if (!master.startsWith("spark://") && !master.startsWith("mesos://")) {
  SparkSubmit.printErrorAndExit(
    "Requesting submission statuses is only supported in standalone or Mesos mode!")
}

Upvotes: 11

Related Questions