Reputation: 619
When I write the mapreduce program, I often write the code like
job1.setMapOutputKeyClass(Text.class);
But why should we specify the MapOutputKeyClass explicitly? We have already spicify it in the mapper class such as
public static class MyMapper extends
Mapper<LongWritable, Text, Text, Text>
In the book Hadoop:The definitive Guide, there is a table shows that the method setMapOutputKeyClass is optional(Properties for configuring types), but as I test, I found it is necessary, or the Console of eclipse will show
Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, received org.apache.hadoop.io.Text
Can someone tell me the reason of it?
In the book, it says
"The settings that have to be compatible with the MapReduce types are listed in the lower part of Table 8-1". Does it mean we have to set the lower part property type, but do not have to set the higher part ones?
the content of the table looks like this:
Properties for configuring types:
mapreduce.job.inputformat.class
mapreduce.map.output.key.class
mapreduce.map.output.value.class
mapreduce.job.output.key.class
mapreduce.job.output.value.class
Properties that must be consistent with the types:
mapreduce.job.map.class
mapreduce.job.combine.class
mapreduce.job.partitioner.class
mapreduce.job.output.key.comparator.class
mapreduce.job.output.group.comparator.class
mapreduce.job.reduce.class
mapreduce.job.outputformat.class
Upvotes: 3
Views: 3059
Reputation: 4067
setMapOutputKeyClass()
as well as setMapOutputValueClass()
are optional as long as they match your job's output types specified by setOutputKeyClass()
and setOutputValueClass()
respectively. In other words, if your mapper output does not match your reducer output you have to use one or both of these methods.
As for your question regarding generic arguments, due to Java type erasure (Java generics type erasure: when and what happens?), Hadoop does not know them at runtime, even though they are known to the compiler.
Upvotes: 8