A. Sarid
A. Sarid

Reputation: 3996

Using Hadoop Counters - Multiple jobs

I am working on a mapreduce project using Hadoop. I currently have 3 sequential jobs.

I want to use Hadoop counters, but the problem is that I want to make the actual count in the first job, but access the counter value in the reducer of the 3rd job.

How can I achieve this? Where should I define the enum? Do I need to pass it threw the second job? It will also help to see some code example for doing this as I couldn't find anything yet.

Note: I am using Hadoop 2.7.2

EDIT: I already tried the approach explained here and it didn't succeeded. My case is different as I want to access the counters from a different job. (not from mapper to reducer).

What I tried to do: First Job:

public static void startFirstJob(String inputPath, String outputPath) throws IOException, ClassNotFoundException, InterruptedException {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "wordCount");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(WordCountMapper.class);
    job.setCombinerClass(WordCountReducer.class);
    job.setReducerClass(WordCountReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(LongWritable.class);
    job.setInputFormatClass(SequenceFileInputFormat.class);
    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    FileInputFormat.addInputPath(job, new Path(inputPath));
    FileOutputFormat.setOutputPath(job, new Path(outputPath));
    job.waitForCompletion(true);
}

Defined the counter enum in a different class:

public class CountersClass {
    public static enum N_COUNTERS {
        SOMECOUNT
    }
}

Trying to read counter:

Cluster cluster = new Cluster(context.getConfiguration());
Job job = cluster.getJob(JobID.forName("wordCount"));
Counters counters = job.getCounters();
CountersClass.N_COUNTERS mycounter = CountersClass.N_COUNTERS.valueOf("SOMECOUNT");
Counter c1 = counters.findCounter(mycounter);
long N_Count = c1.getValue();

Upvotes: 4

Views: 3360

Answers (2)

yurgis
yurgis

Reputation: 4077

Classic solution is to put job's counter value into a configuration of a subsequent job where you need to access it:

So make sure to increment it correctly in the counting job mapper/reducer:

context.getCounter(CountersClass.N_COUNTERS.SOMECOUNT).increment(1);

Then after counting job completion:

job.waitForCompletion(true);

Counter someCount = job.getCounters().findCounter(CountersClass.N_COUNTERS.SOMECOUNT);

//put counter value into conf object of the job where you need to access it
//you can choose any name for the conf key really (i just used counter enum name here)
job2.getConfiguration().setLong(CountersClass.N_COUNTERS.SOMECOUNT.name(), someCount.getValue());

Next piece is to access it in another job's mapper/reducer. Just override setup() For example:

private long someCount;

@Override
protected void setup(Context context) throws IOException,
    InterruptedException {
  super.setup(context);
  this.someCount  = context.getConfiguration().getLong(CountersClass.N_COUNTERS.SOMECOUNT.name(), 0));
}

Upvotes: 5

Radim
Radim

Reputation: 4808

Get the counters at the end of your 1st job and write their value into a file and read it in you sub-sequent job. Write it to HDFS if you want to read it from reducer or to local file if you will read and initialize in the application code.

Counters counters = job.getCounters(); Counter c1 = counters.findCounter(COUNTER_NAME); System.out.println(c1.getDisplayName()+":"+c1.getValue());

Reading and writing files is part of basic tutorials.

Upvotes: 3

Related Questions