I need to get the list of job names that currently running, but hadoop -job list give me a list of jobIDs. Is there a way to get names of the running jobs? Is there a way to get the job names from jobIDs?

hadoop

Karthik

Reputation: 191

How to get names of the currently running hadoop jobs?

I need to get the list of job names that currently running, but hadoop -job list give me a list of jobIDs.

Is there a way to get names of the running jobs?
Is there a way to get the job names from jobIDs?

Upvotes: 19

Answers (8)

David Ongaro

Reputation: 3937

If you use Hadoop YARN don't use mapred job -list (or its deprecated version hadoop job -list) just do

yarn application -appStates RUNNING -list

That also prints out the application/job name. For mapreduce applications you can get the corresponding JobId by replacing the application prefix of the Application-Id with job.

Upvotes: 14

Sheeri

Reputation: 608

I needed to look through history, so I changed mapred job -list to mapred job -list all....

I ended up adding a -L to the curl command, so the block there was:

curl -s -L -XGET {}

This allows for redirection, such as if the job is retired and in the job history. I also found that it's JobName in the history HTML, so I changed the grep:

grep 'Job.*Name'

Plus of course changing hadoop to mapred. Here's the full command:

mapred job -list all | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} | egrep '^tracking' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "echo -n {} | sed 's/.*jobid=//'; echo -n ' ';curl -s -L -XGET {} | grep 'Job.*Name' | sed 's/.* //' | sed 's/<br>//'"

(I also changed around the first grep so that I was only looking at a certain username....YMMV)

Upvotes: 0

Naresh Jangra

Reputation: 48

Just In case any one interested in latest query to get the Job Name :-). Modified Pirooz Command -

mapred job -list 2> /dev/null | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} 2>/dev/null | egrep 'Job File'" | awk '{print $3}' | xargs -n 1 -I{} sh -c "hadoop fs -cat {} 2>/dev/null" | egrep 'mapreduce.job.name' | awk -F"" '{print $2}' | awk -F "" '{print $1}'

Upvotes: 0

mohamus

Reputation: 83

by typing "jps" in your terminal .

Upvotes: -1

Pirooz

Reputation: 1278

Modifying AnthonyF's script, you can use the following on Yarn:

mapred job -list 2> /dev/null | egrep '^\sjob' | awk '{print $1}' | xargs -n 1 -I {} sh -c "mapred job -status {} 2>/dev/null | egrep 'Job File' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "hadoop fs -cat {} 2>/dev/null | egrep 'mapreduce.job.name' | sed 's/.*<value>//' | sed 's/<\/value>.*//'"

Upvotes: 3

USB

Reputation: 6139

You can find the information in JobTracker UI

You can see

Jobid
Priority    
User
Name of the job
State of the job whether it succeed or failed
Start Time  
Finish Time 
Map % Complete  
Reduce % Complete etc

INFO

Upvotes: 0

AnthonyF

Reputation: 888

I've had to do this a number of times so I came up with the following command line that you can throw in a script somewhere and reuse. It prints the jobid followed by the job name.

hadoop job -list | egrep '^job' | awk '{print $1}' | xargs -n 1 -I {} sh -c "hadoop job -status {} | egrep '^tracking' | awk '{print \$3}'" | xargs -n 1 -I{} sh -c "echo -n {} | sed 's/.*jobid=//'; echo -n ' ';curl -s -XGET {} | grep 'Job Name' | sed 's/.* //' | sed 's/<br>//'"

Upvotes: 27

QuinnG

Reputation: 6424

If you do $HADOOP_HOME/bin/hadoop -job -status <jobid> you will get a tracking URL in the output. Going to that URL will give you the tracking page, which has the name

Job Name: <job name here>

The -status command also gives a file, which can also be seen from the tracking URL. In this file is a mapred.job.name which has the job name.

I didn't find a way to access the job name from the command line. Not to say there isn't... but not found by me. :)

The tracking URL and xml file are probably your best options for getting the job name.

Upvotes: 1

How to get names of the currently running hadoop jobs?

Answers (8)

Related Questions