johnsam
johnsam

Reputation: 4562

hadoop-streaming example failed to run - Type mismatch in key from map

I was running  $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -D stream.map.output.field.separator=. \
    -D stream.num.map.output.key.fields=4 \
    -input myInputDirs \
    -output myOutputDir \
    -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
    -reducer org.apache.hadoop.mapred.lib.IdentityReducer 
What hould be the input file when IdentityMapper is the mapper?

I was hoping to see it can sort on certain selected keys and not the entire keys. My input file is simple "aa bb". "cc dd" Not sure what did I miss? I always get this error java.lang.Exception: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:371) Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable

Upvotes: 2

Views: 2316

Answers (2)

Snake Eye
Snake Eye

Reputation: 535

I was trying my hands on Hadoop with my own example, but got the same error. I used KeyValueTextInputFormat to resolve the issue. You can have a look at following blog for the same.

http://sanketraut.blogspot.in/2012/06/hadoop-example-setting-up-hadoop-on.html

Hope it helps you.

Peace. Sanket Raut

Upvotes: 0

Praveen Sripati
Praveen Sripati

Reputation: 33495

This is a known bug and here is the JIRA. The bug has been identified in Hadoop 0.21.0, but I don't think it's in any of the Hadoop release version. If you are really interested to fix this, you can

  • download the source code for Hadoop (for the release you are working)
  • download the patch from JIRA and apply it
  • build and test Hadoop

Here are the instructions on how to apply a patch.

Or instead of using an IdentityMapper and the IdentityReducder, use a python/perl scripts which will read the k/v pairs from STDIN and then write the same k/v pairs to the STDOUT without any processing. It's like creating your own IdentityMapper and the IdentityReducder not using Java.

Upvotes: 4

Related Questions