Hadoop - WordCount runs fine, but another example gets stuck

Question

I ran WordCount on a single node on my mac and it worked, so I made another MapReduce application and ran it, but it gets stuck at map 10% reduce 0% and sometimes at map 0% reduce 0%. The code of the application I made:

public class TemperatureMaximale {

    public static class TemperatureMapper extends Mapper{

        private Text city = new Text();
        private IntWritable temperature = new IntWritable();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                String line = itr.nextToken();
                String cityStr = line.split(",")[0];
                int temperatureInt = Integer.parseInt(line.split(",")[1].replaceAll("\s+", ""));
                city.set(cityStr);
                temperature.set(temperatureInt);
                context.write(city, temperature);

            }
        }

    }

    public static class TemperatureReducer extends Reducer {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
            int maxValue = Integer.MIN_VALUE; 
            for (IntWritable value : values) {
                maxValue = Math.max(maxValue, value.get());
            }
            result.set(maxValue);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "temperature");
        job.setJarByClass(TemperatureMaximale.class);
        job.setMapperClass(TemperatureMapper.class);
        job.setCombinerClass(TemperatureReducer.class);
        job.setReducerClass(TemperatureReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[1]));
        FileOutputFormat.setOutputPath(job, new Path(args[2]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
}

I have no idea why this doesn't work since it's basically a copy of WordCount, I just do some different operations on map and reduce methods.

Example of the files I'm using as input:

Toronto, 20
Whitby, 25
New York, 22
Rome, 32

Henrique Andrade · Accepted Answer

I figured it out, it was just lack of memory to execute the job. If you execute hadoop job -list, you can see the needed memory to execute the job. In my case it was 4096M. So I closed all other applications and all jobs ran fine.

You can also solve this configuring YARN in mapred-site.xml to allocate less memory to the job, as follows:


  mapreduce.map.memory.mb
  1024


  mapreduce.reduce.memory.mb
  1024


  mapreduce.map.java.opts
  -Xmx1638m


  mapreduce.reduce.java.opts
  -Xmx3278m

mapreduce.map.memory.mb and mapreduce.reduce.memory.mb set the YARN container physical memory limits for your map and reduce processes respectively.

mapreduce.map.java.opts and mapreduce.reduce.java.opts set the JVM heap size for your map and reduce processes respectively. As a general rule, they should be 80% the size of the YARN physical memory settings.

Hadoop - WordCount runs fine, but another example gets stuck

Answers (1)

Related Questions