BlueDolphin
BlueDolphin

Reputation: 9765

infinitely loop for org.apache.hadoop.mapred.TaskTracker

I am running one simple hadoop application which collect information from a 64MB file, the map finished quickly, roughly in about 2 -5 minutes, then reduce also runs fast up to 16%, then it just stopped.

This is the program output

11/12/20 17:53:46 INFO input.FileInputFormat: Total input paths to process : 1
11/12/20 17:53:46 INFO mapred.JobClient: Running job: job_201112201749_0001
11/12/20 17:53:47 INFO mapred.JobClient:  map 0% reduce 0%
11/12/20 17:54:06 INFO mapred.JobClient:  map 4% reduce 0%
11/12/20 17:54:09 INFO mapred.JobClient:  map 15% reduce 0%
11/12/20 17:54:12 INFO mapred.JobClient:  map 28% reduce 0%
11/12/20 17:54:15 INFO mapred.JobClient:  map 40% reduce 0%
11/12/20 17:54:18 INFO mapred.JobClient:  map 53% reduce 0%
11/12/20 17:54:21 INFO mapred.JobClient:  map 64% reduce 0%
11/12/20 17:54:24 INFO mapred.JobClient:  map 77% reduce 0%
11/12/20 17:54:27 INFO mapred.JobClient:  map 89% reduce 0%
11/12/20 17:54:30 INFO mapred.JobClient:  map 98% reduce 0%
11/12/20 17:54:33 INFO mapred.JobClient:  map 100% reduce 0%
11/12/20 17:54:54 INFO mapred.JobClient:  map 100% reduce 8%
11/12/20 17:54:57 INFO mapred.JobClient:  map 100% reduce 16%

In the data node log, I see tons of same message again and again, the following is there is starts,

2011-12-20 17:54:51,353 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.083333336% reduce > copy (1 of 4 at 9.01 MB/s) >
2011-12-20 17:54:51,507 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060, dest: 127.0.0.1:44367, bytes: 75623263, op: MAPRED_SHUFFLE, cliID: attempt_201112201749_0001_m_000000_0, duration: 2161793492
2011-12-20 17:54:54,389 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:54:57,433 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:55:40,359 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@53d3cf JVM BUG(s) - injecting delay3 times
2011-12-20 17:55:40,359 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@53d3cf JVM BUG(s) - recreating selector 3 times, canceled keys 72 times
2011-12-20 17:57:51,518 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:57:57,536 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:58:03,554 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >

...

up to

2011-12-20 17:54:51,353 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.083333336% reduce > copy (1 of 4 at 9.01 MB/s) >
2011-12-20 17:54:51,507 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060, dest: 127.0.0.1:44367, bytes: 75623263, op: MAPRED_SHUFFLE, cliID: attempt_201112201749_0001_m_000000_0, duration: 2161793492
2011-12-20 17:54:54,389 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:54:57,433 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:55:40,359 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@53d3cf JVM BUG(s) - injecting delay3 times
2011-12-20 17:55:40,359 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@53d3cf JVM BUG(s) - recreating selector 3 times, canceled keys 72 times
2011-12-20 17:57:51,518 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:57:57,536 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:58:03,554 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >

Here is the code

package com.bluedolphin;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class MyJob {
    private final static LongWritable one = new LongWritable(1);
    private final static Text word = new Text();

    public static class MyMapClass extends Mapper<LongWritable, Text, Text, LongWritable> {
        public void map(LongWritable key, 
                    Text value, 
                    Context context) throws IOException, InterruptedException {
            String[] citation = value.toString().split(",");
            word.set(citation[0]);

            context.write(word, one);
        }
    }

    public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
        private LongWritable result = new LongWritable();
        public void reduce(
                Text key, 
                Iterator<LongWritable> values, 
                Context context) throws IOException, InterruptedException {
            int sum = 0;

            while (values.hasNext()) {
                sum += values.next().get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }


     public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
            if (otherArgs.length != 2) {
              System.err.println("Usage: myjob <in> <out>");
              System.exit(2);
            }
            Job job = new Job(conf, "patent citation");
            job.setJarByClass(MyJob.class);
            job.setMapperClass(MyMapClass.class);
           // job.setCombinerClass(MyReducer.class);
           // job.setReducerClass(MyReducer.class);
            job.setNumReduceTasks(0);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(LongWritable.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(LongWritable.class);

            FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
            FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
            System.exit(job.waitForCompletion(true) ? 0 : 1);
          }

}

I don't know how to further troubleshooting.

Thanks in advace.

Upvotes: 2

Views: 1872

Answers (3)

user2722185
user2722185

Reputation: 21

I also had this infinity loop during the reduce work. After struggling for a day, the solution turns out to be adjust the /etc/hosts file.

It seems that the existence of the entry "127.0.1.1 your_Machine's_Name" confused the Hadoop. One proof of that would be the disability to access the slave:50060, the taskTracker on the slave machine, from the master machine.

As long as this "127.0.1.1 your_machine's_name" entry is deleted and add "your_machine's_name" at the end of the entry "127.0.0.1 localhost", my problem is gone.

I hope this observation can be helpful.

Upvotes: 0

BlueDolphin
BlueDolphin

Reputation: 9765

I figured out the solution, in the reduce method signature, I should have been using Iterable, not Iterator. Because of that, the reduce method was actually not called. It runs fine now, but I don't know the internal reason that it hung before.

Upvotes: 1

Nishant Nagwani
Nishant Nagwani

Reputation: 1270

A couple of things i noticed in your code:

Since you are updating the Text object "word" in map and the LongWritable object "result" in reduce for each call to map and reduce respectively, you probably shouldn't declare them final (Although i don't see this as a problem in this case, as the objects are only changing the state).

Your code looks similar to the trivial word count, only difference is as you are just emiting one value per record in map. You could just eliminate reduce (i.e., run a map-only job) and see if you get what you expect from map.

Upvotes: 0

Related Questions