Reputation: 624

Hadoop MapReduce Jobs: Get the counter outside of the native jvm

I am very new in Hadoop and Hbase.

And my use case is very simple: I want to get reduce input groups count for a job in the run time (i.e get the counter being updated from the initiation to the termination of the job ).

What I have searched so far: All job related logs are written under directory /var/log/hadoop/userlogs like shown below:

[root@dev1-slave1 userlogs]# pwd
/var/log/hadoop/userlogs
[root@dev1-slave1 userlogs]# ll
total 24
drwx--x--- 2 mapred mapred 4096 Jan 13 19:59 job_201501121917_0008
drwx--x--- 2 mapred mapred 4096 Jan 13 11:31 job_201501121917_0009
drwx--x--- 2 mapred mapred 4096 Jan 13 12:01 job_201501121917_0010
drwx--x--- 2 mapred mapred 4096 Jan 13 12:13 job_201501121917_0011
drwx--x--- 2 mapred mapred 4096 Jan 13 12:23 job_201501121917_0012
drwx--x--- 2 mapred mapred 4096 Jan 13 19:59 job_201501121917_0013

Under each job, there are directories such as attempt_201501121917_0013_m_000000_0 (mapper log) and attempt_201501121917_0013_r_000000_0 (reducer log).

The reducer log directory attempt_201501121917_0013_r_000000_0 contains syslog which contains information about job run. But it doesn't show any information about the counter.

From the jobtracker UI of hadoop, I could see the counter reduce input groups being updated until the job is finished but I could not find the same elsewhere.

How can I achieve this? Is there any Java API to get job-wise counters in an another application (NOT in the application which is performing mapreduce tasks) ?

Any other logs or other files which I should look into?

I hope my requirement is clear.

UPDATE:

Hadoop version: Hadoop 1.0.3-Intel

Upvotes: 1

Answers (3)

Eponymous

Reputation: 6811

You can also get the counters from the command-line without writing any Java:

hadoop job -counter job_id group_name counter_name

or (for newer versions)

mapred job -counter job_id group_name counter_name

Upvotes: 1

Rohit

Reputation: 624

I found the answer for my question in a different way.

Below are the codes:

import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobStatus;
import org.apache.hadoop.conf.Configuration;
import java.net.InetSocketAddress;
import org.apache.hadoop.mapred.Counters;
import org.apache.hadoop.mapred.ClusterStatus;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapred.*;
public class jobclienttest{
        public static void main(String args[]){
                String jobTrackerHost = "192.168.151.14";
                int jobTrackerPort = 54311;
                try{
                        JobClient jobClient = new JobClient(new InetSocketAddress(jobTrackerHost, jobTrackerPort), new Configuration());
                        JobStatus[] activeJobs = jobClient.jobsToComplete();

                        for(JobStatus js: activeJobs){
                                System.out.println(js.getJobID());
                                RunningJob runningjob = jobClient.getJob(js.getJobID());
                                Counters counters = runningjob.getCounters();
                                Counter counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter","REDUCE_INPUT_GROUPS");
                                System.out.println(counter.getValue());
                        }
                }catch(Exception ex){
                        ex.printStackTrace();
                }
        }
}

The code is self-explanatory. The Class names speak itself.

COMPILE:

javac -classpath /usr/lib/hadoop/hadoop-core.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-lang-2.4.jar:. jobclienttest.java

RUN:

java -classpath /usr/lib/hadoop/hadoop-core.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-lang-2.4.jar:. jobclienttest

And this gives the output of the Counter.

Upvotes: 0

yurgis

Reputation: 4067

Assuming you know your job id, you can look up your job by id (I think for some limited time depending how soon your cluster cleans up job history).

public long getInputGroups(String jobId, Configuration conf) {
    Cluster cluster = new Cluster(conf);
    Job job = cluster.getJob(JobID.forName(jobId));
    Counters counters = job.getCounters();
    Counter counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter","REDUCE_I‌NPUT_GROUPS");
    return counter.getValue();
}

For more reading see Hadoop: The Definitive Guide.

Upvotes: 1

Hadoop MapReduce Jobs: Get the counter outside of the native jvm

Answers (3)

Related Questions