chengdol
chengdol

Reputation: 229

Hadoop Mapper parameters meaning

I am new to Hadoop and have a question about the parameters: for the word count example, see below code snippet:

public static class TokenizerMapper
   extends Mapper<LongWritable, Text, Text, IntWritable> {

   .....

   public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException 
   {
       .......
   }
}

I know the "value" parameter is the line read from file, but what does the "key" parameter mean? what does it correspond to?

Why it's type is LongWritable?

I wasted several hours on it by searching the doc, could anyone help?

Upvotes: 1

Views: 538

Answers (1)

Gyanendra Dwivedi
Gyanendra Dwivedi

Reputation: 5538

The key is of type LongWritable because the wordcount program takes the input as TextInputFormat

As per JavDoc for TextInputFormat

An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..

By definition suppose your text is

We are fine.
How are you?
All are fine.

Then Input to the mapper is

Key: 1 Value:We are fine.

Key: 14 Value:How are you? (There are approx 13 characters in first line including newline, so line position is 14)

Key:28 Value:All are fine. (There are approx 13 more characters in second line including newline, so line position since start of the file is 28)

Upvotes: 2

Related Questions