MapReduce job showing error after map part

Question

Wordcount program is failing after map part. Below error is thrown.
This is first mapreduce program I am trying after doing hadoop setup.
OS: Mac
hadoop version:1.2.1
$HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk -Djava.net.preferIPv4Stack=true"
Hadoop log:

14/06/10 20:58:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/06/10 20:58:59 INFO input.FileInputFormat: Total input paths to process : 1
14/06/10 20:58:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/06/10 20:58:59 WARN snappy.LoadSnappy: Snappy native library not loaded
14/06/10 20:58:59 INFO mapred.JobClient: Running job: job_201406102056_0002
14/06/10 20:59:00 INFO mapred.JobClient:  map 0% reduce 0%
14/06/10 20:59:06 INFO mapred.JobClient:  map 100% reduce 0%
14/06/10 21:01:04 INFO mapred.JobClient: Task Id : attempt_201406102056_0002_m_000000_0, Status : FAILED
Too many fetch-failures
14/06/10 21:01:05 WARN mapred.JobClient: Error reading task outputServer returned HTTP response code: 407 for URL: http://localhost:50060/tasklog?plaintext=true&attemptid=attempt_201406102056_0002_m_000000_0&filter=stdout
14/06/10 21:01:05 WARN mapred.JobClient: Error reading task outputServer returned HTTP response code: 407 for URL: http://localhost:50060/tasklog?plaintext=true&attemptid=attempt_201406102056_0002_m_000000_0&filter=stderr
14/06/10 21:01:06 INFO mapred.JobClient:  map 0% reduce 0%
14/06/10 21:01:08 INFO mapred.JobClient:  map 100% reduce 0%
14/06/10 21:02:10 INFO mapred.JobClient: Task Id : attempt_201406102056_0002_m_000000_1, Status : FAILED
Too many fetch-failures
14/06/10 21:03:10 WARN mapred.JobClient: Error reading task outputRead timed out
14/06/10 21:04:10 WARN mapred.JobClient: Error reading task outputRead timed out
14/06/10 21:06:55 INFO mapred.JobClient: Task Id : attempt_201406102056_0002_m_000000_2, Status : FAILED
Too many fetch-failures

My wordcount program:

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper 
       extends Mapper{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer 
       extends Reducer {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    if (args.length != 2) {
      System.err.println("Usage: wordcount  ");
      System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Tariq · Accepted Answer

This error indicates that reducers are unable to fetch the mapper output. If this happens repeatedly, TaskTrackers could get blacklisted. Also, make sure that you are not suffering from improper DNS resolution.

Make sure that you have enough http threads on the mapper side. This could be tweaked using tasktracker.http.threads property in your mapred-site.xml file. It decides the number of worker threads for the http server and is used for map output fetching. You could also increase the number of parallel reduce transfers through the property mapred.reduce.parallel.copies.

P.S. : Add the following lines to your driver code to avoid further problems :

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

MapReduce job showing error after map part

Answers (1)

Related Questions