Hadoop Mapper parameters meaning

Question

I am new to Hadoop and have a question about the parameters: for the word count example, see below code snippet:

public static class TokenizerMapper
   extends Mapper {

   .....

   public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException 
   {
       .......
   }
}

I know the "value" parameter is the line read from file, but what does the "key" parameter mean? what does it correspond to?

Why it's type is LongWritable?

I wasted several hours on it by searching the doc, could anyone help?

Gyanendra Dwivedi · Accepted Answer

The key is of type LongWritable because the wordcount program takes the input as TextInputFormat

As per JavDoc for TextInputFormat

An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..

By definition suppose your text is

We are fine.
How are you?
All are fine.

Then Input to the mapper is

Key: 1 Value:We are fine.

Key: 14 Value:How are you? (There are approx 13 characters in first line including newline, so line position is 14)

Key:28 Value:All are fine. (There are approx 13 more characters in second line including newline, so line position since start of the file is 28)

Hadoop Mapper parameters meaning

Answers (1)

Related Questions