S Kr
S Kr

Reputation: 1840

Why am i getting class cast exception in my hadoop map reduce program?

Why am i getting class cast exception in my hadoop map reduce program? Now this is giving me an exception. My map should produce output in key/value as Text/IntWritable. I am doing that , but still getting a IOException

public class AverageClaimsPerPatentsByCountry {


    public static class  MyMap extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

        @Override
        public void map(LongWritable key, Text value,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {

            String[] fields = value.toString().split(",");
            if(fields.length >=7) {
                String country = fields[4];
                String claimsCount = fields[8];
                System.out.println(value.toString());

                int i = Integer.valueOf(claimsCount);
                System.out.println(country+" --> "+i);
                if(claimsCount.length() > 0) {

                    output.collect(new Text(country), new IntWritable(i));
                }
            }
        }

    }

    public static class MyReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, DoubleWritable> {

        @Override
        public void reduce(Text key, Iterator<IntWritable> values,
                OutputCollector<Text, DoubleWritable> output, Reporter reporter)
                throws IOException {
            int count = 0;
            double claimsCount = 0;
            while(values.hasNext()) {
                claimsCount+=Double.valueOf(values.next().get());
                count++;
            }
            double average = claimsCount/count;
            output.collect(key, new DoubleWritable(average));           
        }

    }

    public static class MyJob extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            Configuration conf = getConf();
            JobConf job = new JobConf(conf, MyJob.class);
            FileInputFormat.addInputPaths(job, "patents/patents.csv");
            FileOutputFormat.setOutputPath(job, new Path("patents/output"));
            job.setInputFormat(TextInputFormat.class);
            job.setOutputFormat(TextOutputFormat.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(DoubleWritable.class);
            job.setMapperClass(MyMap.class);
            job.setReducerClass(MyReducer.class);
            JobClient.runJob(job);
            return 0;
        }

    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        ToolRunner.run(conf, new MyJob(), args);
    }

}

Exception :-->
12/09/30 18:32:34 INFO mapred.JobClient: Running job: job_local_0001
12/09/30 18:32:34 INFO mapred.FileInputFormat: Total input paths to process : 1
12/09/30 18:32:34 INFO mapred.MapTask: numReduceTasks: 1
12/09/30 18:32:34 INFO mapred.MapTask: io.sort.mb = 100
12/09/30 18:32:35 INFO mapred.MapTask: data buffer = 79691776/99614720
12/09/30 18:32:35 INFO mapred.MapTask: record buffer = 262144/327680
4000000,1976,6206,1974,"US","NV",,1,10,106,1,12,12,17,0.3333,0.7197,0.375,8.6471,26.8333,,,,
"US" --> 10
12/09/30 18:32:35 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.DoubleWritable, recieved org.apache.hadoop.io.IntWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:850)
    at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
    at action.eg1.AverageClaimsPerPatentsByCountry$MyMap.map(AverageClaimsPerPatentsByCountry.java:53)
    at action.eg1.AverageClaimsPerPatentsByCountry$MyMap.map(AverageClaimsPerPatentsByCountry.java:1)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/09/30 18:32:35 INFO mapred.JobClient:  map 0% reduce 0%
12/09/30 18:32:35 INFO mapred.JobClient: Job complete: job_local_0001
12/09/30 18:32:35 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!

Upvotes: 1

Views: 1913

Answers (2)

Shlomo Georg Konwisser
Shlomo Georg Konwisser

Reputation: 443

Quoting from https://developer.yahoo.com/hadoop/tutorial/module4.html:

The data types emitted by the reducer are identified by setOutputKeyClass() and setOutputValueClass(). By default, it is assumed that these are the output types of the mapper as well. If this is not the case, the methods setMapOutputKeyClass() and setMapOutputValueClass() methods of the JobConf class will override these.

Thus setOutputKeyClass() and setOutputValueClass() defines the output types for both, mapper and reducer. If the mapper should have different output types, use setMapOutputKeyClass() and setMapOutputValueClass().

In the current Haddop version (2.5.1 but also some versions before) it is recommended to use the Job class instead of JobConf:

Job job = Job.getInstance(new Configuration());

job.setMapOutputKeyClass(YourOutputKeyClass1.class);
job.setMapOutputValueClass(YourOutputValueClass1.class);

job.setOutputKeyClass(YourOutputKeyClass2.class);
job.setOutputValueClass(YourOutputValueClass2.class);

Concluding from the quote (and my experience) if you have a mapper-only job (without a reducer) setOutputKeyClass() has the same effect as setMapOutputKeyClass() (same for setOutputValueClass() and setMapOutputValueClass()).

Upvotes: 0

davek
davek

Reputation: 22895

If you don't specify an output class for your mapper it will default to the class given in setOutputClass i.e. MyReducer.

You need this:

setMapOutputClass(IntWritable.class)

Upvotes: 1

Related Questions