hadoop-streaming example failed to run - Type mismatch in key from map

Question

I was running  $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -D stream.map.output.field.separator=. \
    -D stream.num.map.output.key.fields=4 \
    -input myInputDirs \
    -output myOutputDir \
    -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
    -reducer org.apache.hadoop.mapred.lib.IdentityReducer 
What hould be the input file when IdentityMapper is the mapper?

I was hoping to see it can sort on certain selected keys and not the entire keys. My input file is simple "aa bb". "cc dd" Not sure what did I miss? I always get this error java.lang.Exception: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:371) Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable

Praveen Sripati · Accepted Answer

This is a known bug and here is the JIRA. The bug has been identified in Hadoop 0.21.0, but I don't think it's in any of the Hadoop release version. If you are really interested to fix this, you can

download the source code for Hadoop (for the release you are working)
download the patch from JIRA and apply it
build and test Hadoop

Here are the instructions on how to apply a patch.

Or instead of using an IdentityMapper and the IdentityReducder, use a python/perl scripts which will read the k/v pairs from STDIN and then write the same k/v pairs to the STDOUT without any processing. It's like creating your own IdentityMapper and the IdentityReducder not using Java.

hadoop-streaming example failed to run - Type mismatch in key from map

Answers (2)

Related Questions