Mario
Mario

Reputation: 33

The "key" parameter of Hadoop map function is not used

I have been trying to learn hadoop. In the examples I saw (such as the word counting example) the key parameter of the map function is not used at all. The map function only uses the value part of the pair. So it seems to be that the key parameter is unnecessary, but it should not be. What am I missing here? Can you give me example map functions which use the key parameter?

Thanks

Upvotes: 2

Views: 1563

Answers (2)

rhitz
rhitz

Reputation: 1892

In wordcount example : As we want to count the occurrence of each word in the file. we used the follwing method:

In Mapper -

Key is the offset of the text file.

Value - Line in text file.

For example. file.txt

Hi I love Hadoop.

I code in Java. 

Here

Key - 0 ,  value - Hi I love Hadoop.

Key - 17 , value - I code in Java.

(key - 17 is offset from start of file.)

Basically the offset for key is default and we do not need it especially in Wordcount.

Now later logic is I guess you will get here and many more available links.

Just in case:

In Reducer

Key is the Word Value is 1 which is its count.

Upvotes: 2

Karthik
Karthik

Reputation: 1811

To understand about the use of key, you need to know various input formats available in Hadoop.

  1. TextInputFormat - An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..

  2. NLineInputFormat- NLineInputFormat which splits N lines of input as one split. In many "pleasantly" parallel applications, each process/mapper processes the same input file (s), but with computations are controlled by different parameters. (Referred to as "parameter sweeps"). One way to achieve this, is to specify a set of parameters (one set per line) as input in a control file (which is the input path to the map-reduce application, where as the input dataset is specified via a config variable in JobConf.). The NLineInputFormat can be used in such applications, that splits the input file such that by default, one line is fed as a value to one map task, and key is the offset. i.e. (k,v) is (LongWritable, Text). The location hints will span the whole mapred cluster.

  3. KeyValue TextInputFormat - An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. E ach line is divided into key and value parts by a separator byte. If no such a byte exists, the key will be the entire line and value will be empty.

  4. SequenceFileAsBinaryInputFormat- InputFormat reading keys, values from SequenceFiles in binary (raw) format.

  5. SequenceFileAsTextInputFormat- This class is similar to SequenceFileInputFormat, except it generates SequenceFileAsTextRecordReader which converts the input keys and values to their String forms by calling toString() method.

Upvotes: 2

Related Questions